Bob created MAPREDUCE-6485:
------------------------------

             Summary: MR job hanged forever because all resources are taken up 
by reducers and the last map attempt never get resource to run
                 Key: MAPREDUCE-6485
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
    Affects Versions: 3.0.0
            Reporter: Bob
            Priority: Critical


The scenarios is like this:
With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will 
take resource and  start to run when all the map have not finished. 
But It could happened that when all the resources are taken up by running 
reduces, there is still one map not finished. 
Under this condition , the last map have two task attempts .
As for the first attempt was killed due to timeout(mapreduce.task.timeout), and 
its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP, so failed map 
attempt would not be started. 
As for the second attempt which was started due to having enable map task 
speculative is pending at UNASSINGED state because of no resource available. 
But the second map attempt request have lower priority than reduces, so 
preemption would not happened.
As a result all reduces would not finished because of there is one map left. 
and the last map hanged there because of no resource available. so, the job 
would never finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to