[
https://issues.apache.org/jira/browse/KUDU-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947274#comment-16947274
]
Adar Dembo commented on KUDU-2966:
----------------------------------
Forgot to mention; here's how the delay manifested in the Java client's traces:
{noformat}
...
[10039ms] querying master
[10040ms] Sub rpc: ConnectToMaster sending RPC to server
master-cdhmn002.mydomain.local:7051
[10040ms] Sub rpc: ConnectToMaster sending RPC to server
master-cdhmn004.mydomain.local:7051
[10040ms] Sub rpc: ConnectToMaster sending RPC to server
master-cdhmn005.mydomain.local:7051
[10050ms] Sub rpc: ConnectToMaster received from server
master-cdhmn002.mydomain.local:7051 response OK
[10050ms] Sub rpc: ConnectToMaster received from server
master-cdhmn005.mydomain.local:7051 response OK
[20060ms] Sub rpc: ConnectToMaster received from server
master-cdhmn004.mydomain.local:7051 response Network error: [peer
master-cdhmn004.mydomain.local:7051] encountered a read timeout; closing the
channel
...
{noformat}
And in the C++ client:
{noformat}
W1005 08:37:49.847681 1969583 negotiation.cc:313] Failed RPC negotiation. Trace:
1005 08:37:46.846727 (+ 0us) reactor.cc:583] Submitting negotiation task
for client connection to 172.22.152.82:7050
1005 08:37:46.880187 (+ 33460us) negotiation.cc:98] Waiting for socket to
connect
1005 08:37:46.880194 (+ 7us) client_negotiation.cc:168] Beginning
negotiation
1005 08:37:46.880212 (+ 18us) client_negotiation.cc:245] Sending NEGOTIATE
NegotiatePB request
1005 08:37:46.880378 (+ 166us) client_negotiation.cc:262] Received NEGOTIATE
NegotiatePB response
1005 08:37:46.880379 (+ 1us) client_negotiation.cc:356] Received NEGOTIATE
response from server
1005 08:37:46.880383 (+ 4us) client_negotiation.cc:183] Negotiated
authn=SASL
1005 08:37:46.880427 (+ 44us) client_negotiation.cc:472] Sending
TLS_HANDSHAKE message to server
1005 08:37:46.880428 (+ 1us) client_negotiation.cc:245] Sending
TLS_HANDSHAKE NegotiatePB request
1005 08:37:46.882796 (+ 2368us) client_negotiation.cc:262] Received
TLS_HANDSHAKE NegotiatePB response
1005 08:37:46.882797 (+ 1us) client_negotiation.cc:485] Received
TLS_HANDSHAKE response from server
1005 08:37:46.886664 (+ 3867us) client_negotiation.cc:472] Sending
TLS_HANDSHAKE message to server
1005 08:37:46.886666 (+ 2us) client_negotiation.cc:245] Sending
TLS_HANDSHAKE NegotiatePB request
1005 08:37:49.847411 (+2960745us) negotiation.cc:304] Negotiation complete:
Network error: Client connection negotiation failed: client connection to
172.22.152.82:7050: BlockingRecv error: recv got EOF from 172.22.152.82:7050
(error 108)
Metrics: {"client-negotiator.queue_time_us":33440}
{noformat}
> Make client negotiation timeouts configurable
> ---------------------------------------------
>
> Key: KUDU-2966
> URL: https://issues.apache.org/jira/browse/KUDU-2966
> Project: Kudu
> Issue Type: Bug
> Components: java, rpc
> Affects Versions: 1.11.0
> Reporter: Adar Dembo
> Priority: Major
>
> We saw a cluster in the wild where some negotiation steps between endpoints
> were additionally delayed for some small number of seconds. The existing
> {{\-\-rpc_negotiation_timeout_ms}} gflag can help workaround this on servers,
> but there's no equivalent in clients, whose negotiation timeouts are
> hardcoded to 3s in the C++ client and 10s in the Java client.
> It would be nice to expose a simple API to reconfigure the negotiation
> timeout.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)