[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286915#comment-13286915
 ] 

Tom White commented on MAPREDUCE-4299:
--------------------------------------

The code in RMContainerAllocator is meant to handle this case by ramping up the 
number reducers as maps finish. However, there seems to be something fishy 
about the total amount of memory available to the job. Compare

2012-05-24 16:47:25,803 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 
0.3 totalMemLimit:63488 finalMapMemLimit:44442 finalReduceMemLimit:19046 
netScheduledMapMem:117760 netScheduledReduceMem:15360

to

2012-05-24 16:47:07,521 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
PendingReduces:30 ScheduledMaps:160 ScheduledReduces:0 AssignedMaps:0 
AssignedReduces:0 completedMaps:0 completedReduces:0 containersAllocated:0 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:0 
availableResources(headroom):memory: 32768

The first says that there is 63488 MB of memory, the second 32768 MB (these 
numbers stay the same throughout the job). So what could be happening is that 
the allocator slowly ramps up the number of reducers until they use up 32768 MB 
(32 slots at 1024MB apiece) thinking that there is still memory available when 
there isn't. The code has some confusion between the terms 'available 
resource', 'headroom', and 'cluster resource' - i.e. it's not clear if 
available resource is a total, or just what's not in use. 
RMContainerAllocator.getMemLimit() suggests the latter, while the FifoScheduler 
has the line {{application.setHeadroom(clusterResource)}} which suggests that 
it's a fixed total.
                
> Terasort hangs with MR2 FifoScheduler
> -------------------------------------
>
>                 Key: MAPREDUCE-4299
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4299
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Tom White
>
> What happens is that the number of reducers ramp up until they occupy all of 
> the job's containers, at which point the maps no longer make any progress and 
> the job hangs.
> When the same job is run with the CapacityScheduler it succeeds, so this 
> looks like a FifoScheduler bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to