Herman Chen created MAPREDUCE-4304:
--------------------------------------

             Summary: Deadlock where all containers are held by 
ApplicationMasters should be prevented
                 Key: MAPREDUCE-4304
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4304
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2, resourcemanager
    Affects Versions: 0.23.1
            Reporter: Herman Chen


In my test cluster with 4 NodeManagers, each with only ~1.6G container memory, 
when a burst of jobs, e.g. >10, are concurrently submitted, it is likely that 4 
jobs are accepted, with 4 ApplicationMasters allocated, but then the jobs block 
each other indefinitely because they're all waiting to allocate more containers.

Note that the problem is not limited to tiny cluster like this.  As long as the 
number of jobs being submitted is greater than the rate jobs finish, it may run 
into a vicious cycle where more and more containers are locked up by 
ApplicationMasters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to