Will Berkeley created KUDU-2589:
-----------------------------------

             Summary: ITClient is flaky under stress when leadership changes
                 Key: KUDU-2589
                 URL: https://issues.apache.org/jira/browse/KUDU-2589
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Will Berkeley


Saw this failure in ITClient test:

{{noformat}}
00:34:51.362 [DEBUG - New I/O worker #10] (AsyncKuduScanner.java:492) Can not 
open scanner
org.apache.kudu.client.NonRecoverableException: Tablet hasn't heard from 
leader, or there hasn't been a stable leader for: 0.757s secs, (max is 0.750s):
        at org.apache.kudu.client.RpcProxy.dispatchTSError(RpcProxy.java:320)
        at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:242)
        at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
        at 
org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at 
org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{{noformat}}

There was a new leader elected just before:

{{noformat}}
00:34:50.953 [INFO - cluster stderr printer] (MiniKuduCluster.java:526) I0925 
00:34:50.953722 18257 catalog_manager.cc:3758] T 
07e99535cba24d8e991829485de22275 P 060cf49269fb4f6f901696d741e69303 reported 
cstate change: term changed from 1 to 2, leader changed from 
acaedc4ec505489dbc853d8b32bfc147 (127.16.196.2) to 
060cf49269fb4f6f901696d741e69303 (127.16.196.3). New cstate: current_term: 2 
leader_uuid: "060cf49269fb4f6f901696d741e69303" committed_config { opid_index: 
-1 OBSOLETE_local: false peers { permanent_uuid: 
"acaedc4ec505489dbc853d8b32bfc147" member_type: VOTER last_known_addr { host: 
"127.16.196.2" port: 58760 } health_report { overall_health: UNKNOWN } } peers 
{ permanent_uuid: "060cf49269fb4f6f901696d741e69303" member_type: VOTER 
last_known_addr { host: "127.16.196.3" port: 37181 } health_report { 
overall_health: HEALTHY } } peers { permanent_uuid: 
"2f0ee942338c4fb7bce908b1c32f0bcb" member_type: VOTER last_known_addr { host: 
"127.16.196.1" port: 47865 } health_report { overall_health: UNKNOWN } } }
{{noformat}}

but it seems the 0.75 seconds the async scanner will wait for a leader 
partially overlapped with the election, and the remaining portion of the period 
wasn't enough to get an answer on who the new leader was.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to