[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032503#comment-14032503
 ] 

Jason Lowe commented on MAPREDUCE-5928:
---------------------------------------

I'm wondering if the fact that the nodemanger memory has a fractional remainder 
when it's "full" triggers the issue.  With tasks all being 512MB that means 
each node will have 152MB remaining.  I'm guessing that with enough nodes those 
remainders will add up to appear to be enough space to run another task but in 
reality that task cannot be scheduled since the memory being reported is 
fragmented.

> Deadlock allocating containers for mappers and reducers
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-5928
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5928
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)
>            Reporter: Niels Basjes
>         Attachments: Cluster fully loaded.png.jpg, MR job stuck in 
> deadlock.png.jpg
>
>
> I have a small cluster consisting of 8 desktop class systems (1 master + 7 
> workers).
> Due to the small memory of these systems I configured yarn as follows:
> {quote}
> yarn.nodemanager.resource.memory-mb = 2200
> yarn.scheduler.minimum-allocation-mb = 250
> {quote}
> On my client I did
> {quote}
> mapreduce.map.memory.mb = 512
> mapreduce.reduce.memory.mb = 512
> {quote}
> Now I run a job with 27 mappers and 32 reducers.
> After a while I saw this deadlock occur:
> -     All nodes had been filled to their maximum capacity with reducers.
> -     1 Mapper was waiting for a container slot to start in.
> I tried killing reducer attempts but that didn't help (new reducer attempts 
> simply took the existing container).
> *Workaround*:
> I set this value from my job. The default value is 0.05 (= 5%)
> {quote}
> mapreduce.job.reduce.slowstart.completedmaps = 0.99f
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to