[ 
https://issues.apache.org/jira/browse/KUDU-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533006#comment-17533006
 ] 

Wenzhe Zhou commented on KUDU-3366:
-----------------------------------

Here is code analysis in detail:

Impala coordinator calls RpcController::Cancel() to schedule a RPC cancellation 
task for reactor thread pool. When reactor thread executes the cancellation 
task with function ReactorThread::CancelOutboundCall(), the function calls 
Connection::CancelOutboundCall(), then calls OutboundCall::Cancel(). 
Connection::CancelOutboundCall() reset car->call as null pointer which will 
lead Connection::HandleOutboundCallTimeout() to skip calling 
OutboundCall::SetTimedOut(). OutboundCall::Cancel() will not call 
OutboundCall::SetCancelled() if the OutboundCall object is in SENDING state. 
OutboundCall::SetCancelled() will be called until OutboundCall:SetSent() is 
called when the state is transferred from SENDING to SENT. So if a RPC is 
cancelled, OutboundCall::SetTimedOut() will not be called for its OutboundCall 
object when the timeout is handled in Connection::HandleOutboundCallTimeout(), 
and OutboundCall::SetCancelled() will not be called until 
OutboundCall:SetSent() is called when OutboundCall object is in SENDING state.

OutboundCall:SetSent() is called by function 
CallTransferCallbacks::NotifyTransferFinished() if notification of transfer 
finishing is received after sending a RPC call on the wire.
Connection::ProcessOutboundTransfers() call OutboundCall::SetSending() to set 
OutboundCall's state as SENDING when starting transfer RPC. It then calls 
OutboundTransfer::SendBuffer() to send data through socket.
OutboundTransfer::SendBuffer() calls socket->Writev() to send data. If 
socket->Writev() return error, the SendBuffer() function will return error 
without calling CallTransferCallbacks::NotifyTransferFinished() so 
OutboundCall::SetSent() will not be called. This lead to 
OutboundCall::SetCancelled() is not called for the OutboundCall object.
Connection::ProcessOutboundTransfers() then calls 
ReactorThread::DestroyConnection() to destroy the connection. 
ReactorThread::DestroyConnection() calls Connection::Shutdown() to clear all 
outbound calls which have been sent and were awaiting a response. But for a RPC 
being cancelled, its car->call is already reset as null pointer so 
OutboundCall::SetFailed() will not be called for the OutboundCall object.

To summary, Connection::CancelOutboundCall() reset car->call as null pointer, 
which will lead Connection::HandleOutboundCallTimeout() to skip calling 
OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling 
OutboundCall::SetFailed(). socket->Writev() error causes 
OutboundCall::SetSent() not been called, hence OutboundCall::SetCancelled() not 
been called.
Since OutboundCall::SetFailed(), OutboundCall::SetCancelled() and 
OutboundCall::SetTimedOut() are not called for the OutboundCall object, the 
object cannot be transferred from SENDING state to a finished state, so that 
RPC callback function will not be called.

> KRPC callback function not called when cancelling KRPC
> ------------------------------------------------------
>
>                 Key: KUDU-3366
>                 URL: https://issues.apache.org/jira/browse/KUDU-3366
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Wenzhe Zhou
>            Priority: Major
>
> Impala ran into an issue which caused a thread hang when cancelling a query. 
> Impala log messages shows that Impala coordinator called 
> RpcController::Cancel() to cancel RPC, then waited RPC callback function to 
> be called. But the KRPC callback function was not called. This caused the 
> Impala thread wait forever. See Impala-11263.
> KRPC cancellation was implemented in KUDU-2065 with patch 
> https://gerrit.cloudera.org/#/c/7455/. According to the comments of 
> KUDU-2065, they decided not to do cancellation for outbound request in 
> SENDING state since cancelling calls in SENDING state seems too complicated, 
> and expect most calls to be drained quickly and outbound request will be 
> transferred from SENDING to SENT.
> But reactor thread function ReactorThread::CancelOutboundCall() calls 
> Connection::CancelOutboundCall() before calling OutboundCall::Cancel().  
> Connection::CancelOutboundCall() reset car->call as null pointer, this lead 
> Connection::HandleOutboundCallTimeout() to skip calling 
> OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling 
> OutboundCall::SetFailed(). In case socket->Writev() fails while outbound 
> request in SENDING state, CallTransferCallbacks::NotifyTransferFinished() 
> will not be called, hence OutboundCall::SetSent() will not be called. This 
> causes outbound request cannot be transferred from SENDING state to SENT 
> state, hence KRPC callback function is not called in this corner case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to