Rahul Jain created MAPREDUCE-4560:
-------------------------------------

             Summary: Job can get stuck in a deadlock between mappers and 
reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
                 Key: MAPREDUCE-4560
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Rahul Jain
             Fix For: 2.0.0-alpha


This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab 
systems.

The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default 
value).

We found Application master stuck in a deadlock between mappers and reducers 
with no progress in the job; the sequence appears to be:

1. Initial available map/reduce slots were allocated to mappers
2. Once mappers made progress and few of them completed, reducers started 
occupying few of the slots due to low values of above config param.
3. The scheduler appears to not give priority to mappers over reducers; after a 
while in our system we saw all slots occupied by reducers.
4. Since there were still mapper tasks not yet assigned any slot, the map phase 
never completed.
5. The system entered a deadlock state where reducers occupy all available 
slots, but are waiting for mappers to be complete; mappers cannot move forward 
because of no slot available.

The workaround in our system was to set 
mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to