[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241907#comment-15241907
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---------------------------------------

For this case I think increasing the task heartbeat period would help.  The 
problem is the task heartbeats are piling on asynchronous events faster than 
they are being consumed, so if they were posted less frequent the AM could keep 
up.  A configurable task report interval would be a simple and straightforward 
approach which would give users a knob to turn when this happens for large jobs 
that run very wide.  Alternatively the AM could look at the number of events 
queued up and automatically tune the task heartbeat.

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to