[jira] [Commented] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Bikas Saha (JIRA) Thu, 02 Apr 2015 17:30:06 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393807#comment-14393807
 ]


Bikas Saha commented on TEZ-2251:
---------------------------------

bq. Without multiple threads - tasks would always be created before a 
downstream vertex is re-configured
No. Tasks create the spec after they are scheduled. And they can get scheduled 
before the downstream edge is reconfigured (so get spec from original edge) or 
after downstream edge is reconfigured (so get spec from new edge).

Rajesh, this particular race condition will not happen in 0.5 or 0.6 since 
everything is single threaded there.

> Enabling auto reduce parallelism in certain jobs causes DAG to hang
> -------------------------------------------------------------------
>
>                 Key: TEZ-2251
>                 URL: https://issues.apache.org/jira/browse/TEZ-2251
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2251.2.patch, TEZ-2251.3.patch, 
> TEZ-2251.VertexImpl.patch, TEZ-2251.VertexImpl.readlock.patch, 
> TEZ-2251.fix_but_slows_down.patch, hive_console.png, 
> tez-2251.vertexpatch.am.log.gz, tez_2251_dag.png
>
>
> Scenario:
> - Run TPCH query20 
> (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpch/tpch_query20.sql)
>  at 1 TB scale (tez-master branch, hive trunk)
> - Enable auto reduce parallelism
> - DAG didn't complete and got stuck in "Reducer 6"
> Vertex parallelism of "Reducer 5 & 6" happens within a span of 3 
> milliseconds, and tasks of "reducer 5" ends up producing wrong partition 
> details as it sees the updated task numbers of reducer 6 when scheduled.  
> This causes, job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Reply via email to