[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2788:
------------------------------------

    Attachment: MAPREDUCE-2788_rev3.patch

Thanks Arun for your comments, I have looked into normalizing the requests 
within the CapacityScheduler.

It doesn't seem that the call to LeafQueue.assignContainer(..) come via 
CapacityScheduler.allocate(). It gets called through the call path:

LeafQueue.assignContainer(..) <- assignNodeLocalContainers(..) <-- 
LeafQueue.assignContainersOnNode(..) <- LeafQueue.assignContainers(..)

There are alternative paths, but all lead to the same source.

The SchedulerApp application (in the LeafQueue.assignContainers(..) call) is 
one of the Map<ApplicationAttemptId, SchedulerApp> applicationsMap values. This 
applicationsMap is only populated through LeafQueue.addApplication(..). 

The LeafQueue.addApplication(..) is called  through the path: 
LeafQueue.addApplication(..) <- LeafQueue.submitApplication(..) <- 
CapacityScheduler.addApplication(..).

So I have added code to CapacityScheduler.addApplication(..) to normalize all 
resource requests for the SchedulerApp before submitting to the queue.

If the LeafQueue is interminably tied to CS, we may need to update the 
references in LeafQueue to use CapacityScheduler instead of 
CapacitySchedulerContext, this will make such dependency clear and avoid future 
confusions. I haven't made this interface change in the attached patch, as it 
requires more changes to other components, but if we agree about it, I can do 
it in a following issue.
                
> LeafQueue.assignContainer() can cause a crash if 
> request.getCapability().getMemory() == 0
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2788
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2788
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>            Priority: Critical
>         Attachments: MAPREDUCE-2788.patch, MAPREDUCE-2788_rev2.patch, 
> MAPREDUCE-2788_rev3.patch
>
>
> The assignContainer() method in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> can cause the scheduler to crash if the ResourseRequest capability memory == 
> 0 (divide by zero).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to