Wenzhe Zhou created KUDU-3366:
---------------------------------
Summary: KRPC callback function not called when cancelling KRPC
Key: KUDU-3366
URL: https://issues.apache.org/jira/browse/KUDU-3366
Project: Kudu
Issue Type: Bug
Components: rpc
Reporter: Wenzhe Zhou
Impala ran into an issue which caused a thread hang when cancelling a query.
Impala log messages shows that Impala coordinator called
RpcController::Cancel() to cancel RPC, then waited RPC callback function to be
called. But the KRPC callback function was not called. This caused the Impala
thread wait forever. See Impala-11263.
KRPC cancellation was implemented in KUDU-2065 with patch
https://gerrit.cloudera.org/#/c/7455/. According to the comments of KUDU-2065,
they decided not to do cancellation for outbound request in SENDING state since
cancelling calls in SENDING state seems too complicated, and expect most calls
to be drained quickly and outbound request will be transferred from SENDING to
SENT.
But reactor thread function ReactorThread::CancelOutboundCall() calls
Connection::CancelOutboundCall() before calling OutboundCall::Cancel().
Connection::CancelOutboundCall() reset car->call as null pointer, this lead
Connection::HandleOutboundCallTimeout() to skip calling
OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling
OutboundCall::SetFailed(). In case socket->Writev() fails while outbound
request in SENDING state, CallTransferCallbacks::NotifyTransferFinished() will
not be called, hence OutboundCall::SetSent() will not be called. This causes
outbound request cannot be transferred from SENDING state to SENT state, hence
KRPC callback function is not called in this corner case.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)