[
https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480289#comment-13480289
]
Jason Lowe commented on MAPREDUCE-4730:
---------------------------------------
Update on testing, I was able to test this (along with the fix for
MAPREDUCE-4733) using a sleep job with 20000 maps and 3000 reduces on a cluster
big enough to mass-launch the map and reduce phases. The AM with a 1.5GB slot
size stayed up during the job, where previously it failed even with a larger
slot.
The only issue I ran into was a significant number of maps and reduces failed
because they timed out trying to establish a connection to the AM. I suspected
the AM could have been busy garbage collecting and causing the delays, so I
bumped up the AM size to 3G and it ran smoothly with no connection timeout
failures from any tasks.
> AM crashes due to OOM while serving up map task completion events
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-4730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, mrv2
> Affects Versions: 0.23.3
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Blocker
> Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch
>
>
> We're seeing a repeatable OOM crash in the AM for a task with around 30000
> maps and 3000 reducers. Details to follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira