[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916660#comment-16916660
 ] 

ASF subversion and git services commented on IMPALA-8845:
---------------------------------------------------------

Commit 1c4bdcede475395d1139210a5d3ddf2641efa7eb in impala's branch 
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1c4bdce ]

IMPALA-8845: Cancel receiver's streams on exchange node's EOS

When an exchange node reaches its row count limit,
the current code will not notify the sender fragments
about it. Consequently, sender fragments may keep sending
row batches to the exchange node but they won't be dequeued
anymore. The sender fragments may end up blocking in the
RPC indefinitely until either the query is cancelled or
closed.

This change fixes the problem above by cancelling the
underlying receiver's streams of an exchange node once it
reaches the row count limit. This will unblock all senders
whose TransmitData() RPCs haven't been replied to yet. Any
future row batches sent to this receiver will also be immediately
replied to with a response indicating that this receiver is
already closed so the sender will stop sending any more row
batches to it.

Change-Id: I10c805e9d63ed8af9f458bf71e8ef5ea9376b939
Reviewed-on: http://gerrit.cloudera.org:8080/14101
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-8845
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8845
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Michael Ho
>            Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to