[
https://issues.apache.org/jira/browse/MAPREDUCE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287762#comment-13287762
]
Ahmed Radwan commented on MAPREDUCE-4304:
-----------------------------------------
But, even for the capacity scheduler, what is a good strategy for tuning this
property especially when there are variability in the sizes of jobs, and the
resources they require? This can be really hard, and can endup in either an
underutilization of the cluster or the described deadlock.
A possible approach will be preemption of AM containers if their resource
requirements cannot be fulfilled within a period of time, they can go back to
the queue and only restarted when there are resources. The scheduler can keep
track of required resources to avoid recomputation of resources when the AM is
restarted. Thoughts?
> Deadlock where all containers are held by ApplicationMasters should be
> prevented
> --------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4304
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2, resourcemanager
> Affects Versions: 0.23.1
> Reporter: Herman Chen
>
> In my test cluster with 4 NodeManagers, each with only ~1.6G container
> memory, when a burst of jobs, e.g. >10, are concurrently submitted, it is
> likely that 4 jobs are accepted, with 4 ApplicationMasters allocated, but
> then the jobs block each other indefinitely because they're all waiting to
> allocate more containers.
> Note that the problem is not limited to tiny cluster like this. As long as
> the number of jobs being submitted is greater than the rate jobs finish, it
> may run into a vicious cycle where more and more containers are locked up by
> ApplicationMasters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira