[jira] [Updated] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Rajesh Balamohan (JIRA) Wed, 01 Apr 2015 23:00:12 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-2251:
----------------------------------
    Attachment: TEZ-2251.2.patch

Attaching the patch which works.
- Issue is that,  "Reducer 6" is in the middle of processing setParallelism 
(with write lock acquired) in "App shared pool" threads
- When "Reducer 5" task gets scheduled, it should be ideally get the read lock 
on "Reducer 6" to get specs properly.

[~bikassaha] Can you please review?

> Enabling auto reduce parallelism in certain jobs causes DAG to hang
> -------------------------------------------------------------------
>
>                 Key: TEZ-2251
>                 URL: https://issues.apache.org/jira/browse/TEZ-2251
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: TEZ-2251.2.patch, TEZ-2251.VertexImpl.patch, 
> TEZ-2251.VertexImpl.readlock.patch, TEZ-2251.fix_but_slows_down.patch, 
> hive_console.png, tez-2251.vertexpatch.am.log.gz, tez_2251_dag.png
>
>
> Scenario:
> - Run TPCH query20 
> (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpch/tpch_query20.sql)
>  at 1 TB scale (tez-master branch, hive trunk)
> - Enable auto reduce parallelism
> - DAG didn't complete and got stuck in "Reducer 6"
> Vertex parallelism of "Reducer 5 & 6" happens within a span of 3 
> milliseconds, and tasks of "reducer 5" ends up producing wrong partition 
> details as it sees the updated task numbers of reducer 6 when scheduled.  
> This causes, job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2251) Enabling auto reduce parallelism in certain jobs causes DAG to hang

Reply via email to