Mike Percy created KUDU-2736:
--------------------------------
Summary: RemoteKsckTest.TestClusterWithLocation is flaky
Key: KUDU-2736
URL: https://issues.apache.org/jira/browse/KUDU-2736
Project: Kudu
Issue Type: Improvement
Components: test
Affects Versions: 1.9.0
Reporter: Mike Percy
RemoteKsckTest.TestClusterWithLocation is flaky
Alexey took a look at it and here is the analysis:
In essence, due to slowness of TSAN builds, connection negotiation from kudu
CLI to one of master servers timed out, so one of the preconditions of the test
didn't meet. The error output by the test was:
{code:java}
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523:
Failure
Failed
Bad status: Network error: failed to gather info from all masters: 1 of 3 had
errors
{code}
The corresponding error in the master's log was:
{code:java}
W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. Trace:
0221 12:38:23.949428 (+ 0us) reactor.cc:583] Submitting negotiation task
for client connection to 127.25.42.190:51799
0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to
connect
0221 12:38:25.363489 (+ 1269us) client_negotiation.cc:167] Beginning
negotiation
0221 12:38:25.369976 (+ 6487us) client_negotiation.cc:244] Sending NEGOTIATE
NegotiatePB request
0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received NEGOTIATE
NegotiatePB response
0221 12:38:25.431610 (+ 28us) client_negotiation.cc:355] Received NEGOTIATE
response from server
0221 12:38:25.432659 (+ 1049us) client_negotiation.cc:182] Negotiated
authn=SASL
0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received
TLS_HANDSHAKE response from server
0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending
TLS_HANDSHAKE message to server
0221 12:38:27.062132 (+ 47us) client_negotiation.cc:244] Sending
TLS_HANDSHAKE NegotiatePB request
0221 12:38:27.064391 (+ 2259us) negotiation.cc:304] Negotiation complete:
Timed out: Client connection negotiation failed: client connection to
127.25.42.190:51799: BlockingWrite timed out
{code}
We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)