[ 
https://issues.apache.org/jira/browse/KUDU-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720732#comment-16720732
 ] 

Adar Dembo commented on KUDU-2387:
----------------------------------

I spent some time independently looking into this before realizing there was 
already a JIRA filed. Below are my notes, stored for posterity.

When {{ConnectToCluster}} fails to find a leader master, it'll invoke its 
deferred's callback with an exception of type {{RecoverableException}}. As far 
as I can tell, whether or not the request is retried depends on what errbacks 
have been chained to that deferred. If we reached {{ConnectToCluster}} via 
{{sendRpcToTablet()}}, it ensures that a {{RetryRpcErrback}} is chained to the 
deferred, and when invoked, that errback looks at the exception type, and if 
it's recoverable, it schedules a retry of the RPC. So the important observation 
here is that invoking a deferred with a {{RecoverableException}} isn't enough 
for an RPC to be retried; we must ensure that your call chain includes 
{{RetryRpcErrback}} (usually done by calling {{sendRpcToTablet()}}).

So what's going on? Well, {{exportAuthenticationCredentials()}} uses 
{{ConnectToCluster}} via {{getMasterTableLocationsPB()}}, but there's no 
wrapping in {{sendRpcToTablet()}}. So I don't see who on the call chain is 
going to retry the {{RecoverableException}}, and it makes sense that the 
exception just propagates up the stack and fails the application. BTW, 
{{getHiveMetastoreConfig()}} is equally vulnerable to this.

Here's the log that helped me make sense of this. It's a failure in 
DefaultSourceTest with the cluster logging stripped away:
{noformat}
20:09:07.846 [INFO - Test worker] (RandomUtils.java:46) Using random seed: 
1543896547846
20:09:07.847 [INFO - Test worker] (KuduTestHarness.java:137) Creating a new 
MiniKuduCluster...
20:09:07.847 [INFO - Test worker] (MiniKuduCluster.java:146) Using the temp 
directory defined by TEST_TMPDIR: 
/data/somelongdirectorytoavoidrpathissues/src/kudutest
20:09:07.847 [INFO - Test worker] (KuduBinaryLocator.java:52) Using Kudu binary 
directory specified by system property 'kuduBinDir': 
/data/somelongdirectorytoavoidrpathissues/src/kudu/java/../build/latest/bin
20:09:07.847 [INFO - Test worker] (MiniKuduCluster.java:197) Starting process: 
[/data/somelongdirectorytoavoidrpathissues/src/kudu/java/../build/latest/bin/kudu,
 test, mini_cluster, --serialization=pb]
20:09:08.266 [INFO - Test worker] (KuduTestHarness.java:139) Creating a new 
Kudu client...
20:09:08.337 [WARN - New I/O worker #587] (ConnectToCluster.java:278) None of 
the provided masters 
127.27.137.61:53849,127.27.137.60:37363,127.27.137.62:55005 is a leader; will 
retry
20:09:08.373 [ERROR - Test worker] (RetryRule.java:76) 
testUpsertRowsIgnoreNulls(org.apache.kudu.spark.kudu.DefaultSourceTest): failed 
run 1
java.security.PrivilegedActionException: 
org.apache.kudu.client.NoLeaderFoundException: Master config 
(127.27.137.61:53849,127.27.137.60:37363,127.27.137.62:55005) has no leader.
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:122)
        at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:65)
        at 
org.apache.kudu.spark.kudu.KuduTestSuite$class.setUpBase(KuduTestSuite.scala:129)
        at 
org.apache.kudu.spark.kudu.DefaultSourceTest.setUpBase(DefaultSourceTest.scala:38)
        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at 
org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:72)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
        at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
        at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
        at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
        at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
        at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
        at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
        at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
        at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config 
(127.27.137.61:53849,127.27.137.60:37363,127.27.137.62:55005) has no leader.
        at 
org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
        at 
org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:365)
        at 
org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:316)
        at 
org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:124)
        at 
org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:122)
        ... 56 more
        Suppressed: org.apache.kudu.client.KuduException$OriginalException: 
Original asynchronous stack trace
                at 
org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
                at 
org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
                at 
org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
                at 
org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
                at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
                at 
com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
                at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
                at 
org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
                at org.apache.kudu.client.KuduRpc.callback(KuduRpc.java:294)
                at 
org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:269)
                at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
                at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
                at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
                at 
org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
                at 
org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
                at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
                at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
                at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
                at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
                at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
                at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
                at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
                at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                ... 1 more
{noformat}

> exportAuthenticationCredentials does not retry connectToCluster
> ---------------------------------------------------------------
>
>                 Key: KUDU-2387
>                 URL: https://issues.apache.org/jira/browse/KUDU-2387
>             Project: Kudu
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> TestSecurity has the following TODO:
> {code}
>     // TODO(todd): it seems that exportAuthenticationCredentials() doesn't 
> properly retry
>     // in the case that there is no leader, even though 
> NoLeaderFoundException is a RecoverableException.
>     // So, we have to use a hack of calling listTabletServers, which _does_ 
> properly retry,
>     // in order to wait for the masters to elect a leader.
> {code}
> It seems like this causes occasional failures of tests like KuduRDDTest -- I 
> saw a case where the client failed to connect due to a negotiation timeout, 
> and then didn't retry at all. It's not clear why the 3-second negotiation 
> timeout was insufficient in this test case but likely just machine load or 
> somesuch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to