[ 
https://issues.apache.org/jira/browse/IMPALA-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-5729.
--------------------------------------------
    Resolution: Cannot Reproduce

> Kudu may crash in minicluster if clock becomes unsynchronized
> -------------------------------------------------------------
>
>                 Key: IMPALA-5729
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5729
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.10.0
>            Reporter: Henry Robinson
>            Priority: Major
>              Labels: flaky-test, kudu
>
> See e.g. https://jenkins.impala.io/job/gerrit-verify-dryrun/937/consoleFull. 
> {code}
> 00:44:28 ] E   HiveServer2Error: AnalysisException: Error opening Kudu table 
> 'impala::tpch_kudu.lineitem', Kudu error: can not complete before timeout: 
> KuduRpc(method=GetTableSchema, tablet=null, attempt=94, 
> DeadlineTracker(timeout=180000, elapsed=179403), Traces: [0ms] querying 
> master, [0ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [0ms] Sub rpc: ConnectToMaster received from server 
> master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] 
> connection closed, [1ms] delaying RPC due to Service unavailable: Master 
> config (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [22ms] querying master, [22ms] Sub rpc: ConnectToMaster 
> sending RPC to server master-127.0.0.1:7051, [22ms] Sub rpc: ConnectToMaster 
> received from server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [23ms] delaying RPC due to Service 
> unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [42ms] querying master, [42ms] Sub 
> rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [42ms] Sub 
> rpc: ConnectToMaster received from server master-127.0.0.1:7051 response 
> Network error: [peer master-127.0.0.1:7051] connection closed, [43ms] 
> delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has 
> no leader. Exceptions received: org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.0.1:7051] connection closed, [62ms] querying master, 
> [63ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, 
> [63ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 
> response Network error: [peer master-127.0.0.1:7051] connection closed, 
> [63ms] delaying RPC due to Service unavailable: Master config 
> (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [82ms] querying master, [82ms] Sub rpc: ConnectToMaster 
> sending RPC to server master-127.0.0.1:7051, [82ms] Sub rpc: ConnectToMaster 
> received from server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [83ms] delaying RPC due to Service 
> unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [102ms] querying master, [102ms] 
> Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [103ms] 
> Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response 
> Network error: [peer master-127.0.0.1:7051] connection closed, [103ms] 
> delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has 
> no leader. Exceptions received: org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.0.1:7051] connection closed, [162ms] querying master, 
> [162ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, 
> [162ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 
> response Network error: [peer master-127.0.0.1:7051] connection closed, 
> [163ms] delaying RPC due to Service unavailable: Master config 
> (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [242ms] querying master, [242ms] Sub rpc: ConnectToMaster 
> sending RPC to server master-127.0.0.1:7051, [242ms] Sub rpc: ConnectToMaster 
> received from server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [243ms] delaying RPC due to Service 
> unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [362ms] querying master, [362ms] 
> Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, [362ms] 
> Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 response 
> Network error: [peer master-127.0.0.1:7051] connection closed, [363ms] 
> delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has 
> no leader. Exceptions received: org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.0.1:7051] connection closed, [763ms] querying master, 
> [763ms] Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, 
> [763ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 
> response Network error: [peer master-127.0.0.1:7051] connection closed, 
> [764ms] delaying RPC due to Service unavailable: Master config 
> (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [962ms] querying master, [962ms] Sub rpc: ConnectToMaster 
> sending RPC to server master-127.0.0.1:7051, [963ms] Sub rpc: ConnectToMaster 
> received from server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [963ms] delaying RPC due to Service 
> unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [2862ms] querying master, [2863ms] 
> Sub rpc: ConnectToMaster sending RPC to server master-127.0.0.1:7051, 
> [2863ms] Sub rpc: ConnectToMaster received from server master-127.0.0.1:7051 
> response Network error: [peer master-127.0.0.1:7051] connection closed, 
> [2864ms] delaying RPC due to Service unavailable: Master config 
> (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [5842ms] querying master, [5843ms] Sub rpc: 
> ConnectToMaster sending RPC to server master-127.0.0.1:7051, [5843ms] Sub 
> rpc: ConnectToMaster received from server master-127.0.0.1:7051 response 
> Network error: [peer master-127.0.0.1:7051] connection closed, [5844ms] 
> delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has 
> no leader. Exceptions received: org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.0.1:7051] connection closed, [9102ms] querying master, 
> [9103ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [9103ms] Sub rpc: ConnectToMaster received from server 
> master-127.0.0.1:7051 response Network error: [peer master-127.0.0.1:7051] 
> connection closed, [9104ms] delaying RPC due to Service unavailable: Master 
> config (127.0.0.1:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: [peer master-127.0.0.1:7051] 
> connection closed, [9342ms] querying master, [9343ms] Sub rpc: 
> ConnectToMaster sending RPC to server master-127.0.0.1:7051, [9343ms] Sub 
> rpc: ConnectToMaster received from server master-127.0.0.1:7051 response 
> Network error: [peer master-127.0.0.1:7051] connection closed, [9344ms] 
> delaying RPC due to Service unavailable: Master config (127.0.0.1:7051) has 
> no leader. Exceptions received: org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.0.1:7051] connection closed, [13142ms] querying master, 
> [13142ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [13142ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [13143ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [16082ms] querying master, 
> [16083ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [16083ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [16084ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [19702ms] querying master, 
> [19702ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [19703ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [19703ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [21282ms] querying master, 
> [21282ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [21282ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [21283ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [22702ms] querying master, 
> [22702ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [22702ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [22703ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [24702ms] querying master, 
> [24702ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [24703ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [24703ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [24822ms] querying master, 
> [24822ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [24823ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [24824ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [27362ms] querying master, 
> [27362ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [27363ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [27363ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [28862ms] querying master, 
> [28863ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [28863ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [28864ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [30023ms] querying master, 
> [30023ms] Sub rpc: ConnectToMaster sending RPC to server 
> master-127.0.0.1:7051, [30023ms] Sub rpc: ConnectToMaster received from 
> server master-127.0.0.1:7051 response Network error: [peer 
> master-127.0.0.1:7051] connection closed, [30024ms] delaying RPC due to 
> Service unavailable: Master config (127.0.0.1:7051) has no leader. Exceptions 
> received: org.apache.kudu.client.RecoverableException: [peer 
> master-127.0.0.1:7051] connection closed, [30402ms] trace too long, truncated)
> {code}
> In another example, one tablet server's logs say:
> {code}
> W0726 21:13:15.084012 21047 heartbeater.cc:499] Failed to heartbeat to 
> 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: 
> Client connection negotiation failed: client connection to 127.0.0.1:7051: 
> connect: Connection refused (error 111)
> W0726 21:13:15.084311 21047 heartbeater.cc:499] Failed to heartbeat to 
> 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: 
> Client connection negotiation failed: client connection to 127.0.0.1:7051: 
> connect: Connection refused (error 111)
> W0726 21:13:15.084451 21047 heartbeater.cc:499] Failed to heartbeat to 
> 127.0.0.1:7051: Network error: Failed to ping master at 127.0.0.1:7051: 
> Client connection negotiation failed: client connection to 127.0.0.1:7051: 
> connect: Connection refused (error 111)
> W0726 21:13:15.084461 21047 heartbeater.cc:326] Failed 3 heartbeats in a row: 
> no longer allowing fast heartbeat attempts.
> W0726 22:32:04.053184 115410 log.cc:665] Time spent T 
> b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.078s  user 0.026s     sys 0.050s
> W0726 22:32:08.173146 115415 log.cc:665] Time spent T 
> d244f5c3ae53407688176aecdba8fc97 P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.067s  user 0.028s     sys 0.037s
> W0726 22:32:09.778363 115410 log.cc:665] Time spent T 
> b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.101s  user 0.024s     sys 0.073s
> W0726 22:32:12.515143 115410 log.cc:665] Time spent T 
> b1dc3548d9994642b0237fa267835d7d P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.083s  user 0.032s     sys 0.047s
> W0726 22:32:20.965201 115615 log.cc:665] Time spent T 
> e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.066s  user 0.025s     sys 0.039s
> W0726 22:32:21.965332 115561 log.cc:665] Time spent T 
> 8f219cda1bf7401c9b58884b4f7f7d5f P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.148s  user 0.023s     sys 0.122s
> W0726 22:32:21.974016 115615 log.cc:665] Time spent T 
> e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.072s  user 0.026s     sys 0.042s
> W0726 22:32:23.912673 115561 log.cc:665] Time spent T 
> 8f219cda1bf7401c9b58884b4f7f7d5f P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.088s  user 0.027s     sys 0.060s
> W0726 22:32:26.854365 115615 log.cc:665] Time spent T 
> e601566f7a3a46c4aff9d4e5a3681929 P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.111s  user 0.029s     sys 0.009s
> W0726 22:33:16.664842 116458 log.cc:665] Time spent T 
> 7c8ce06d9df9491fbae6844a44c2e8a4 P ea1677566c84417781c6470977bf5ab0: Append 
> to log took a long time: real 0.181s  user 0.019s     sys 0.004s
> W0726 23:35:14.737128 20820 thread.cc:506] raft [worker] (thread pool) Time 
> spent starting thread: real 0.977s  user 0.000s     sys 0.000s
> W0726 23:35:15.670353 20820 thread.cc:512] raft [worker] (thread pool) Time 
> spent creating pthread: real 0.852s user 0.000s     sys 0.000s
> W0726 23:35:15.670446 20829 connection.cc:625] client connection to 
> 127.0.0.1:31201 send error: Network error: sendmsg error: Connection reset by 
> peer (error 104)
> W0726 23:35:15.670465 20829 consensus_peers.cc:378] T 
> cbaaf0212ddb48e297ff5668a233d97b P ea1677566c84417781c6470977bf5ab0 -> Peer 
> 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send 
> request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 
> cbaaf0212ddb48e297ff5668a233d97b. Status: Network error: sendmsg error: 
> Connection reset by peer (error 104). Retrying in the next heartbeat period. 
> Already tried 1 times.
> W0726 23:35:15.670476 20829 consensus_peers.cc:378] T 
> 594f54d9fed34d23aaef264b462df5e9 P ea1677566c84417781c6470977bf5ab0 -> Peer 
> 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send 
> request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 
> 594f54d9fed34d23aaef264b462df5e9. Status: Network error: sendmsg error: 
> Connection reset by peer (error 104). Retrying in the next heartbeat period. 
> Already tried 1 times.
> W0726 23:35:15.670486 20829 consensus_peers.cc:378] T 
> 95cf23a6ac0d47ce90c1ef84defa64ea P ea1677566c84417781c6470977bf5ab0 -> Peer 
> 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send 
> request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 
> 95cf23a6ac0d47ce90c1ef84defa64ea. Status: Network error: sendmsg error: 
> Connection reset by peer (error 104). Retrying in the next heartbeat period. 
> Already tried 1 times.
> W0726 23:35:15.670704 20820 thread.cc:506] raft [worker] (thread pool) Time 
> spent starting thread: real 0.852s  user 0.000s     sys 0.000s
> W0726 23:35:15.847403 20829 consensus_peers.cc:378] T 
> 095cab3197494b54bf4498ca10df8087 P ea1677566c84417781c6470977bf5ab0 -> Peer 
> 7f763fa32d9a4f828eba4361d90c3830 (ip-172-31-13-116:31201): Couldn't send 
> request to peer 7f763fa32d9a4f828eba4361d90c3830 for tablet 
> 095cab3197494b54bf4498ca10df8087. Status: Network error: Client connection 
> negotiation failed: client connection to 127.0.0.1:31201: connect: Connection 
> refused (error 111). Retrying in the next heartbeat period. Already tried 1 
> times
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to