[jira] [Commented] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Gopal V (JIRA) Wed, 01 Apr 2015 17:13:30 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391832#comment-14391832
 ]


Gopal V commented on TEZ-2251:
------------------------------

[~rajesh.balamohan]: I think the ideal interpretation of what slow-start would 
achieve would be the slower model (rather, it's not slower it's the throughput 
model).

The MRv2 model does not differ between "one parent" and "all parents", because 
it has exactly 1 vertex as a parent.

Being slower to auto-reduce might be faster to complete, for a different reason 
- it would produce more accurate reducer parallelism, since it would not rely 
on the smaller table side finishing fast to determine parallelism.

I think you'll get fewer bad estimates if you wait for the big table to finish 
25% as well.

> Enabling auto reduce parallelism in certain jobs causes DAG to hang
> -------------------------------------------------------------------
>
>                 Key: TEZ-2251
>                 URL: https://issues.apache.org/jira/browse/TEZ-2251
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: TEZ-2251.VertexImpl.patch, 
> TEZ-2251.fix_but_slows_down.patch, hive_console.png, tez_2251_dag.png
>
>
> Scenario:
> - Run TPCH query20 
> (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpch/tpch_query20.sql)
>  at 1 TB scale (tez-master branch, hive trunk)
> - Enable auto reduce parallelism
> - DAG didn't complete and got stuck in "Reducer 6"
> Vertex parallelism of "Reducer 5 & 6" happens within a span of 3 
> milliseconds, and tasks of "reducer 5" ends up producing wrong partition 
> details as it sees the updated task numbers of reducer 6 when scheduled.  
> This causes, job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Reply via email to