[
https://issues.apache.org/jira/browse/MAPREDUCE-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Radwan updated MAPREDUCE-2788:
------------------------------------
Attachment: MAPREDUCE-2788_rev3.patch
Thanks Arun for your comments, I have looked into normalizing the requests
within the CapacityScheduler.
It doesn't seem that the call to LeafQueue.assignContainer(..) come via
CapacityScheduler.allocate(). It gets called through the call path:
LeafQueue.assignContainer(..) <- assignNodeLocalContainers(..) <--
LeafQueue.assignContainersOnNode(..) <- LeafQueue.assignContainers(..)
There are alternative paths, but all lead to the same source.
The SchedulerApp application (in the LeafQueue.assignContainers(..) call) is
one of the Map<ApplicationAttemptId, SchedulerApp> applicationsMap values. This
applicationsMap is only populated through LeafQueue.addApplication(..).
The LeafQueue.addApplication(..) is called through the path:
LeafQueue.addApplication(..) <- LeafQueue.submitApplication(..) <-
CapacityScheduler.addApplication(..).
So I have added code to CapacityScheduler.addApplication(..) to normalize all
resource requests for the SchedulerApp before submitting to the queue.
If the LeafQueue is interminably tied to CS, we may need to update the
references in LeafQueue to use CapacityScheduler instead of
CapacitySchedulerContext, this will make such dependency clear and avoid future
confusions. I haven't made this interface change in the attached patch, as it
requires more changes to other components, but if we agree about it, I can do
it in a following issue.
> LeafQueue.assignContainer() can cause a crash if
> request.getCapability().getMemory() == 0
> -----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2788
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2788
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Reporter: Ahmed Radwan
> Assignee: Ahmed Radwan
> Priority: Critical
> Attachments: MAPREDUCE-2788.patch, MAPREDUCE-2788_rev2.patch,
> MAPREDUCE-2788_rev3.patch
>
>
> The assignContainer() method in
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
> can cause the scheduler to crash if the ResourseRequest capability memory ==
> 0 (divide by zero).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira