[
https://issues.apache.org/jira/browse/MAPREDUCE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761409#comment-16761409
]
BELUGA BEHR edited comment on MAPREDUCE-7180 at 2/6/19 3:28 AM:
----------------------------------------------------------------
Thanks for the clarification [~wilfreds] I see where you are coming from.
How confident are we that the 80/20 split will cover all cases? Are there any
stats to back this up? Really the biggest thing is when an application fails
with OOM - Mapper or Reducer needs more heap, enlarging the container size will
automatically increase the heap size at 80/20 split.
was (Author: belugabehr):
The two classes of errors are:
# Application fails with OOM - Mapper or Reducer needs more heap, enlarging the
container size will automatically increase the heap size at 80/20 split
# Overhead size of application is large
The first one is the most common.
> Relaunching Failed Containers
> -----------------------------
>
> Key: MAPREDUCE-7180
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7180
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: mrv1, mrv2
> Reporter: BELUGA BEHR
> Priority: Major
>
> In my experience, it is very common that a MR job completely fails because a
> single Mapper/Reducer container is using more memory than has been reserved
> in YARN. The following message is logging the the MapReduce
> ApplicationMaster:
> {code}
> Container [pid=46028,containerID=container_e54_1435155934213_16721_01_003666]
> is running beyond physical memory limits.
> Current usage: 1.0 GB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual
> memory used. Killing container.
> {code}
> In this case, the container is re-launched on another node, and of course, it
> is killed again for the same reason. This process happens three (maybe
> four?) times before the entire MapReduce job fails. It's often said that the
> definition of insanity is doing the same thing over and over and expecting
> different results.
> For all intents and purposes, the amount of resources requested by Mappers
> and Reducers is a fixed amount; based on the default configuration values.
> Users can set the memory on a per-job basis, but it's a pain, not exact, and
> requires intimate knowledge of the MapReduce framework and its memory usage
> patterns.
> I propose that if the MR ApplicationMaster detects that a container is killed
> because of this specific memory resource constraint, that it requests a
> larger container for the subsequent task attempt.
> For example, increase the requested memory size by 50% each time the
> container fails and the task is retried. This will prevent many Job failures
> and allow for additional memory tuning, per-Job, after the fact, to get
> better performance (v.s. fail/succeed).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]