[ 
https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642871#action_12642871
 ] 

Vivek Ratan commented on HADOOP-4523:
-------------------------------------

HADOOP-3759 provides a configuration value, 
_mapred.tasktracker.tasks.maxmemory_, which specifies the total VM on a machine 
available to tasks spawned by the TT. Along with HADOOP-4439, it provides a 
cluster-wide default for the maximum VM associated per task, 
_mapred.task.default.maxmemory_. This value can be overridden by individual 
jobs. HADOOP-3581 implements a monitoring mechanism that kill tasks if they go 
over their _maxmemory_ value. Keeping all this in mind, here's a proposal for 
what we need to additionally do: 

If _tasks.maxmemory_ is set, the TT monitors the total memory usage of all 
tasks spawned by the TT. If this value goes over _tasks.maxmemory_, the TT 
needs to kill one or more tasks. It first looks for tasks whose individual 
memory is over their _default.maxmemory_ value. These are killed (while you may 
ideally want to kill just enough that your total memory usage comes down, it's 
not obvious which of these violators you choose to kill, so it's probably 
simpler to kill all). If no such task is found, or if killing one or more of 
these tasks still takes us over the memory limit, we need to pick other tasks 
to kill. There are many ways to do this. Probably the easiest is to kill tasks 
that ran most recently. 

Tasks that are killed because they went over their memory limit should be 
treated as failed, since they violated their contract. Tasks that are killed 
because the sum total of memory usage was over a limit should be treated as 
killed, since it's not really their fault. 

Another improvement is to let _mapred.tasktracker.tasks.maxmemory_ be set by an 
external script, which lets Ops control what this value should be. A slightly 
less desirable option, as indicated in some offline discussions with Alan W, is 
to set this value to be an absolute number ("hadoop may use X amount") or an 
offset of the total amount of memory on the machine ("hadoop may use all but  
4g"). 

> Enhance how memory-intensive user tasks are handled
> ---------------------------------------------------
>
>                 Key: HADOOP-4523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4523
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Vivek Ratan
>
> HADOOP-3581 monitors each Hadoop task to see if its memory usage (which 
> includes usage of any tasks spawned by it and so on) is within a per-task 
> limit. If the task's memory usage goes over its limit, the task is killed. 
> This, by itself, is not enough to prevent badly behaving jobs from bringing 
> down nodes. What is also needed is the ability to make sure that the sum 
> total of VM usage of all Hadoop tasks does not exceed a certain limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to