[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

Steven Phillips (JIRA) Tue, 05 May 2015 20:42:12 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529839#comment-14529839
 ]


Steven Phillips commented on DRILL-2936:
----------------------------------------

It turns out this is caused by a sort-of deadlock situation condition that can 
arise with hash-to-merge exchange. The hash-to-merge exchange consists of a 
partition sender and a merging receiver. The partition sender has outgoing 
buckets it sends to the different downstream minor fragments. And each merging 
receiver has an incoming buffer for each of the sending minor fragments.

The merging receiver cannot proceed without data from each of the sending 
fragments. If data from any one of the sending fragments is unavailable, it 
will block until it receives some data from that fragment, or a message 
indicating there is no more data from that fragment.

If there is some skew in the data, it's possible that a partition sender may 
not send any data to a particular receiver. That receiver will end up blocking 
because it is waiting to receive that data. Since it is blocked, it is unable 
to consume the data that it does receive from other senders. After a few 
batches, the sender also blocks due to backpressure, because the receiver is 
unable to consume.

Once we reach this state, the query hangs indefinitely.

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -----------------------------------------------------
>
>                 Key: DRILL-2936
>                 URL: https://issues.apache.org/jira/browse/DRILL-2936
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

Reply via email to