[jira] [Created] (MAPREDUCE-7180) Relaunching Failed Containers

BELUGA BEHR (JIRA) Thu, 31 Jan 2019 07:09:12 -0800

BELUGA BEHR created MAPREDUCE-7180:
--------------------------------------

             Summary: Relaunching Failed Containers
                 Key: MAPREDUCE-7180
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7180
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: mrv1, mrv2
            Reporter: BELUGA BEHR



In my experience, it is very common that a MR job completely fails because a 
single Mapper/Reducer container is using more memory than has been reserved in 
YARN.  The following message is logging the the MapReduce ApplicationMaster:

{code}
Container [pid=46028,containerID=container_e54_1435155934213_16721_01_003666] 
is running beyond physical memory limits. 
Current usage: 1.0 GB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual 
memory used. Killing container.
{code}

In this case, the container is re-launched on another node, and of course, it 
is killed again for the same reason.  This process happens three (maybe four?) 
times before the entire MapReduce job fails.

For all intents and purposes, the amount of resources requested by Mappers and 
Reducers is a fixed amount; based on the default configuration values.  Users 
can set the memory on a per-job basis, but it's a pain, not exact, and requires 
intimate knowledge of the MapReduce framework and its memory usage patterns.

I propose that if the MR ApplicationMaster detects that a container is killed 
because of this specific memory resource constraint, that it requests a larger 
container for the subsequent task attempt.

For example, increase the requested memory size by 50% each time the container 
fails and the task is retried.  This will prevent many Job failures and allow 
for additional memory tuning, per-Job, after the fact, to get better 
performance (v.s. fail/succeed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (MAPREDUCE-7180) Relaunching Failed Containers

Reply via email to