Rajesh Balamohan created TEZ-1400:
-------------------------------------
Summary: Reducers stuck when enabling auto-reduce parallelism (MRR
case)
Key: TEZ-1400
URL: https://issues.apache.org/jira/browse/TEZ-1400
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Attachments: dag.dot
In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck
waiting for events.
e.g
Map 1: 0/1 Map 2: -/- Map 5: 0/1 Map 6: 0/1 Map 7: 0/1
Reducer 3: 0/23 Reducer 4: 0/1
...
...
Map 1: 1/1 Map 2: 148(+13)/161 Map 5: 1/1 Map 6: 1/1 Map 7:
1/1 Reducer 3: 0(+3)/3 Reducer 4: 0(+1)/1 <== Auto reduce
parallelism kicks in
..
Map 1: 1/1 Map 2: 161/161 Map 5: 1/1 Map 6: 1/1 Map 7: 1/1
Reducer 3: 3/3 Reducer 4: 0(+1)/1
Job is stuck waiting for events in Reducer 4.
[fetcher [Reducer_3] #23]
org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 of
23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though Reducer3 has
been optimized to use 3 reducers
--
This message was sent by Atlassian JIRA
(v6.2#6252)