[
https://issues.apache.org/jira/browse/TEZ-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092728#comment-14092728
]
Rajesh Balamohan commented on TEZ-1400:
---------------------------------------
In this case, Hive submitted the plan to Tez and Hive does not set min/max
fractions anywhere. Somewhere within tez code, min/max are getting reset to
"0.0" before landing up in ShuffleVertexManager causing this issue.
For e.g, I checked min/max value when "Map 7" gets
VertexImpl->InitTransition()->transition()-->setupVertex(), the min/max values
in appContext.getConf() are reset to 0.0. Need to check why this is happening.
IMO, appContext.getAMConf()'s values should not be modified during vertex
setups. Please correct me if this assumption is wrong.
> Reducers stuck when enabling auto-reduce parallelism (MRR case)
> ---------------------------------------------------------------
>
> Key: TEZ-1400
> URL: https://issues.apache.org/jira/browse/TEZ-1400
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Labels: performance
> Attachments: TEZ-1400.1.patch, dag.dot
>
>
> In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck
> waiting for events.
> e.g
> Map 1: 0/1 Map 2: -/- Map 5: 0/1 Map 6: 0/1 Map 7: 0/1
> Reducer 3: 0/23 Reducer 4: 0/1
> ...
> ...
> Map 1: 1/1 Map 2: 148(+13)/161 Map 5: 1/1 Map 6: 1/1 Map
> 7: 1/1 Reducer 3: 0(+3)/3 Reducer 4: 0(+1)/1 <== Auto reduce
> parallelism kicks in
> ..
> Map 1: 1/1 Map 2: 161/161 Map 5: 1/1 Map 6: 1/1 Map 7: 1/1
> Reducer 3: 3/3 Reducer 4: 0(+1)/1
> Job is stuck waiting for events in Reducer 4.
> [fetcher [Reducer_3] #23]
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3
> of 23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though
> Reducer3 has been optimized to use 3 reducers
--
This message was sent by Atlassian JIRA
(v6.2#6252)