[jira] [Assigned] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

Wang Zhiqiang (JIRA) Thu, 05 May 2016 23:12:13 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wang Zhiqiang reassigned MAPREDUCE-6689:
----------------------------------------

    Assignee: Wang Zhiqiang  (was: Wangda Tan)

> MapReduce job can infinitely increase number of reducer resource requests
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6689
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wang Zhiqiang
>            Priority: Blocker
>         Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

Reply via email to