Rajesh Balamohan created TEZ-1400:
-------------------------------------

             Summary: Reducers stuck when enabling auto-reduce parallelism (MRR 
case)
                 Key: TEZ-1400
                 URL: https://issues.apache.org/jira/browse/TEZ-1400
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Rajesh Balamohan
            Assignee: Rajesh Balamohan
         Attachments: dag.dot

In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck 
waiting for events.

e.g

Map 1: 0/1      Map 2: -/-      Map 5: 0/1      Map 6: 0/1      Map 7: 0/1      
Reducer 3: 0/23 Reducer 4: 0/1
...
...
Map 1: 1/1      Map 2: 148(+13)/161     Map 5: 1/1      Map 6: 1/1      Map 7: 
1/1      Reducer 3: 0(+3)/3      Reducer 4: 0(+1)/1  <== Auto reduce 
parallelism kicks in
..
Map 1: 1/1      Map 2: 161/161  Map 5: 1/1      Map 6: 1/1      Map 7: 1/1      
Reducer 3: 3/3  Reducer 4: 0(+1)/1

Job is stuck waiting for events in Reducer 4.

 [fetcher [Reducer_3] #23] 
org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 of 
23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though Reducer3 has 
been optimized to use 3 reducers





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to