[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774263#comment-13774263
 ] 

wanbin commented on MAPREDUCE-5510:
-----------------------------------

I have analyzed seriously again. At that time. these two big mapreduce jobs 
were runing meanwhile. these where belong to two queue respectively. As a 
result root queue was full, because of over-capacity. But one queue was not 
full. because cluster's resource exhaust and numbers  of reduce had occupied 
resource in advance. Final map task had hung. I know yarn has a rule when there 
aren't enough resource to assign, map task will trigger to preempt reduce task. 
kill some reduce task, to guarantee map task success. But why no effect at that 
time? 
I research the AM and ResourceManager log, discover that to trigger preemption 
must leaf queue resource is not enough, it isn't care of parent queue or root 
queue resource usage rate. Therefore there appear that  cluster's resource 
exhaust,but don't trigger preemption still.
So I think when ResourceManager is calculating  leaf queue's headroom value, it 
need computes its parent queue usage rate. if its parent queue is full, jobs 
inside leaf queue need consider whether trigger preemption.
I will provide a patch later. 

                
> Root queue is full,  leads to a job hung, it's reduces started,but some maps 
> is pending
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5510
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5510
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.6-alpha
>            Reporter: wanbin
>
> when it hadppen, I notice reduces were not preempt
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
> containers 1
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1377833725640_158209_01_004100 to 
> attempt_1377833725640_158209_r_000258_0
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=582656
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.9940104 totalMemLimit:2548736 finalMapMemLimit:70656 
> finalReduceMemLimit:2478080 netScheduledMapMem:70656 
> netScheduledReduceMem:2460672
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping up 1
> 2013-09-15 10:32:26,608 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:212 ScheduledMaps:22 ScheduledReds:55 AssignedMaps:1 
> AssignedReds:213 CompletedMaps:3817 CompletedReds:0 ContAlloc:4031 ContRel:0 
> HostLocal:0 RackLocal:0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to