[
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko updated MAPREDUCE-5124:
------------------------------------
Attachment: MAPREDUCE-5124-CoalescingPOC3.patch
I attached POC v3.
Main changes:
1. Status events are stored in TaskAttemptListener
2. There is a mapping between attemptId <-> status
3. Status is wrapped in an AtomicReference, so no locking is necessary
4. When an async update is necessary, we pass the AtomicRef in the constructor
of the task update event
5. We don't simply replace already existing status update events. Counters &
fetch failed maps are merged if necessary.
> AM lacks flow control for task events
> -------------------------------------
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Peter Bacsko
> Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch,
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-CoalescingPOC3.patch,
> MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events
> from tasks. If the AM is unable to keep pace with the rate of incoming
> events for a sufficient period of time then it will eventually exhaust the
> heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event
> processing, but the AM could still get behind if it's starved for CPU and/or
> handling a very large job with tens of thousands of active tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]