[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated MAPREDUCE-5124:
------------------------------------
    Attachment: MAPREDUCE-5124-001.patch

Thanks for the comments [~jlowe]. I uploaded the first patch.

What's new:
1. Your comments have been addressed
2. I added new tests and refactored {{TestTaskAttemptListenerImpl}} heavily 
because there was a lot of copy-paste there. I think it's much nicer now.

I still have one question regarding counters. When I was writing the tests, it 
turned out that {{Counters}} object inside the {{TaskStatus}} cannot be null. 
If it is, then we got an NPE thrown from the constructor of 
{{AbstractCounters}}. I'm talking about this part:

{noformat}
    taskAttemptStatus.counters = new org.apache.hadoop.mapreduce.Counters(
      taskStatus.getCounters());
{noformat}

We already know that counters are always sent from MR tasks, but what about 
Tez? I checked the Tez codebase, but I haven't found any call to 
{{statusUpdate()}}. 

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-5124-001.patch, 
> MAPREDUCE-5124-CoalescingPOC-1.patch, MAPREDUCE-5124-CoalescingPOC2.patch, 
> MAPREDUCE-5124-CoalescingPOC3.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to