[
https://issues.apache.org/jira/browse/IMPALA-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472437#comment-16472437
]
Michael Ho commented on IMPALA-6818:
------------------------------------
Ideally, we should just get rid of the timeout. However, the reason we cannot
do it today is that there may be resource leak in some racy cases:
- Sender fragment issues {{KrpcDataStreamSender::TransmitData()}} RPC to
receiver
- Receiving fragment is waiting in {{KrpcDataStreamReceiver::GetBatch()}}
waiting for a row batch to arrive
- Query gets cancelled. Both sender and receiver fragment instances get the
cancellation before the {{TransmitData()}} RPC arrives. Receiver will cancel
itself and remove its entry from {{KrpcDataStreamMgr}}'s map. Sender will
cancel the RPC but there is nothing to stop the payload from being sent to the
receiver. The receiving end will still allocate a buffer for the cancelled RPC
and wait in {{KrpcDataStreamMgr}} for the already-exited receiver.
Today, we have a cache of recently exited receiver so we will notice that the
receiver has recently exited. However, the cache entries are evicted after
certain time period so in theory, there is always a race sequence in which the
{{TransmitData()}} RPC payload will be stuck at the receiving end's
{{KrpcDataStreamMgr}} and never to be processed. This may lead to resource
leak. The timeout we have today will help bail us out of this case.
With the new IMPALA-2990 protocol, we may be able to infer that certain queries
have been cancelled based on the message from the coordinator. We can use that
information to clear out the "stuck" RPCs destined for cancelled queries.
> Rethink data-stream sender/receiver startup sequencing
> ------------------------------------------------------
>
> Key: IMPALA-6818
> URL: https://issues.apache.org/jira/browse/IMPALA-6818
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Reporter: Dan Hecht
> Assignee: Michael Ho
> Priority: Major
>
> IMPALA-1599 introduced parallel fragment startup, which is good for startup
> latency. However, it meant that data-stream senders can start before
> receivers, and there is a timeout to handle the case when the receiver never
> shows up:
> {code:java}
> Sender timed out waiting for receiver fragment instance{code}
> We see this timeout fairly regularly (e.g. when a host has a spike in load
> and does not process the exec rpc for a while). Let's rethink how this works
> to see if we can make it robust but being careful to not sacrifice startup
> time too much.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]