[
https://issues.apache.org/jira/browse/TEZ-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392949#comment-14392949
]
Bikas Saha commented on TEZ-2251:
---------------------------------
lgtm. sorry for the confusion. essentially, the setparallelism and getinputspec
must sync on the same lock.
minor to fix before commit.
1) addition of similar comment to the other changed method. Could you please
mention this jira for reference.
2) unnecessary double lookups into maps for precondition and real work.
3) precondition missing logIdentifier of vertex.
> Enabling auto reduce parallelism in certain jobs causes DAG to hang
> -------------------------------------------------------------------
>
> Key: TEZ-2251
> URL: https://issues.apache.org/jira/browse/TEZ-2251
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Attachments: TEZ-2251.2.patch, TEZ-2251.VertexImpl.patch,
> TEZ-2251.VertexImpl.readlock.patch, TEZ-2251.fix_but_slows_down.patch,
> hive_console.png, tez-2251.vertexpatch.am.log.gz, tez_2251_dag.png
>
>
> Scenario:
> - Run TPCH query20
> (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpch/tpch_query20.sql)
> at 1 TB scale (tez-master branch, hive trunk)
> - Enable auto reduce parallelism
> - DAG didn't complete and got stuck in "Reducer 6"
> Vertex parallelism of "Reducer 5 & 6" happens within a span of 3
> milliseconds, and tasks of "reducer 5" ends up producing wrong partition
> details as it sees the updated task numbers of reducer 6 when scheduled.
> This causes, job to hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)