[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events

Peter Bacsko (JIRA) Wed, 08 Nov 2017 08:56:08 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Peter Bacsko updated MAPREDUCE-5124:
------------------------------------
    Attachment: MAPREDUCE-5124-CoalescingPOC-1.patch

I created a POC which uses this "event coalescing approach".

I roughly describe what changed:
* Added new method {{setNextUpdate()}} to {{TaskAttemptImpl}}
* Added the mapping of TaskAttemptID <-> TaskAttemptImpl
* At each {{statusUpdate()}}, we call {{setNextUpdate()}} and don't pass the 
status object as a payload
* In the {{StatusUpdater}} transition, we check if we need to update the status 
or not. If needsUpdate=true, then we run the original updater logic.

If we have backlog of task update events for a given attempt and that attempt 
hasn't been updated, the {{StatusUpdater}} will not do anything because 
{{needsUpdate}} will be false.

I also kept the original updating logic, that is, retrieving it from the event. 
First I tried to remove the original constructor of 
{{TaskAttemptStatusUpdateEvent}} but it caused compilation errors in various 
classes. It turned out that quite a few test cases use the old approach to 
manipulate the status of a task attempt. I didn't want to introduce too many 
code changes. Not sure what's the best solution in this case.

[~jlowe] could you take a look at this POC?

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events

Reply via email to