[
https://issues.apache.org/jira/browse/KUDU-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533006#comment-17533006
]
Wenzhe Zhou commented on KUDU-3366:
-----------------------------------
Here is code analysis in detail:
Impala coordinator calls RpcController::Cancel() to schedule a RPC cancellation
task for reactor thread pool. When reactor thread executes the cancellation
task with function ReactorThread::CancelOutboundCall(), the function calls
Connection::CancelOutboundCall(), then calls OutboundCall::Cancel().
Connection::CancelOutboundCall() reset car->call as null pointer which will
lead Connection::HandleOutboundCallTimeout() to skip calling
OutboundCall::SetTimedOut(). OutboundCall::Cancel() will not call
OutboundCall::SetCancelled() if the OutboundCall object is in SENDING state.
OutboundCall::SetCancelled() will be called until OutboundCall:SetSent() is
called when the state is transferred from SENDING to SENT. So if a RPC is
cancelled, OutboundCall::SetTimedOut() will not be called for its OutboundCall
object when the timeout is handled in Connection::HandleOutboundCallTimeout(),
and OutboundCall::SetCancelled() will not be called until
OutboundCall:SetSent() is called when OutboundCall object is in SENDING state.
OutboundCall:SetSent() is called by function
CallTransferCallbacks::NotifyTransferFinished() if notification of transfer
finishing is received after sending a RPC call on the wire.
Connection::ProcessOutboundTransfers() call OutboundCall::SetSending() to set
OutboundCall's state as SENDING when starting transfer RPC. It then calls
OutboundTransfer::SendBuffer() to send data through socket.
OutboundTransfer::SendBuffer() calls socket->Writev() to send data. If
socket->Writev() return error, the SendBuffer() function will return error
without calling CallTransferCallbacks::NotifyTransferFinished() so
OutboundCall::SetSent() will not be called. This lead to
OutboundCall::SetCancelled() is not called for the OutboundCall object.
Connection::ProcessOutboundTransfers() then calls
ReactorThread::DestroyConnection() to destroy the connection.
ReactorThread::DestroyConnection() calls Connection::Shutdown() to clear all
outbound calls which have been sent and were awaiting a response. But for a RPC
being cancelled, its car->call is already reset as null pointer so
OutboundCall::SetFailed() will not be called for the OutboundCall object.
To summary, Connection::CancelOutboundCall() reset car->call as null pointer,
which will lead Connection::HandleOutboundCallTimeout() to skip calling
OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling
OutboundCall::SetFailed(). socket->Writev() error causes
OutboundCall::SetSent() not been called, hence OutboundCall::SetCancelled() not
been called.
Since OutboundCall::SetFailed(), OutboundCall::SetCancelled() and
OutboundCall::SetTimedOut() are not called for the OutboundCall object, the
object cannot be transferred from SENDING state to a finished state, so that
RPC callback function will not be called.
> KRPC callback function not called when cancelling KRPC
> ------------------------------------------------------
>
> Key: KUDU-3366
> URL: https://issues.apache.org/jira/browse/KUDU-3366
> Project: Kudu
> Issue Type: Bug
> Components: rpc
> Reporter: Wenzhe Zhou
> Priority: Major
>
> Impala ran into an issue which caused a thread hang when cancelling a query.
> Impala log messages shows that Impala coordinator called
> RpcController::Cancel() to cancel RPC, then waited RPC callback function to
> be called. But the KRPC callback function was not called. This caused the
> Impala thread wait forever. See Impala-11263.
> KRPC cancellation was implemented in KUDU-2065 with patch
> https://gerrit.cloudera.org/#/c/7455/. According to the comments of
> KUDU-2065, they decided not to do cancellation for outbound request in
> SENDING state since cancelling calls in SENDING state seems too complicated,
> and expect most calls to be drained quickly and outbound request will be
> transferred from SENDING to SENT.
> But reactor thread function ReactorThread::CancelOutboundCall() calls
> Connection::CancelOutboundCall() before calling OutboundCall::Cancel().
> Connection::CancelOutboundCall() reset car->call as null pointer, this lead
> Connection::HandleOutboundCallTimeout() to skip calling
> OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling
> OutboundCall::SetFailed(). In case socket->Writev() fails while outbound
> request in SENDING state, CallTransferCallbacks::NotifyTransferFinished()
> will not be called, hence OutboundCall::SetSent() will not be called. This
> causes outbound request cannot be transferred from SENDING state to SENT
> state, hence KRPC callback function is not called in this corner case.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)