[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

Sahil Takiar (JIRA) Mon, 12 Aug 2019 13:24:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905564#comment-16905564
 ]


Sahil Takiar commented on IMPALA-8845:
--------------------------------------

As far as I can tell, the issue described in IMPALA-3990 is still there:
 * The kRPC receiver is closed by {{FragmentInstanceState::Close}} --> 
{{ExchangeNode::Close}} --> {{KrpcDataStreamRecvr::Close}} --> 
{{KrpcDataStreamMgr::DeregisterRecvr}}
 ** {{DeregisterRecvr}} adds the receiver to the {{closed_stream_cache_}}
 * Attempts to send data to the closed receiver will initially get a 
{{DATASTREAM_RECVR_CLOSED}} response
 ** The call trace here is {{DataStreamService::TransmitData}} --> 
{{KrpcDataStreamMgr::AddData}} 
 ** If {{AddData}} finds the receiver in the {{closed_stream_cache_}} it 
responds to the sender with an {{DATASTREAM_RECVR_CLOSED}} error
 ** When the sender receives {{DATASTREAM_RECVR_CLOSED}} it will drop all 
incoming data to {{KrpcDataStreamSender::Channel::TransmitData}} (so no more 
RPCs should be sent to the receiver)
 * If an RPC is sent to a receiver after the {{STREAM_EXPIRATION_TIME_MS}} 
timeout is hit, then the query will fail
 ** The maintenance thread in {{KrpcDataStreamMgr::Maintenance}} will 
eventually remove the receiver from the {{closed_stream_cache_}} and attempts 
to send data to that receiver will eventually hit a 
{{DATASTREAM_SENDER_TIMEOUT}} error (after {{datastream_sender_timeout_ms}} has 
elapsed)
 ** This should be rare, because the logic in {{DATASTREAM_RECVR_CLOSED}} 
should prevent any more rows from being sent to the exchange, but it can happen 
if there are large delays between when row batches are sent

So (as described in IMPALA-3990) if a fragment sends an RPC to an exchange, the 
exchange hits eos and shuts down the kRPC receiver, the 
{{STREAM_EXPIRATION_TIME_MS}} timeout expires, and then the fragment sends 
another RPC to the exchange, an error will occur after 
{{datastream_sender_timeout_ms}}, and the query will fail.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-8845
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8845
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

Reply via email to