[
https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642871#action_12642871
]
Vivek Ratan commented on HADOOP-4523:
-------------------------------------
HADOOP-3759 provides a configuration value,
_mapred.tasktracker.tasks.maxmemory_, which specifies the total VM on a machine
available to tasks spawned by the TT. Along with HADOOP-4439, it provides a
cluster-wide default for the maximum VM associated per task,
_mapred.task.default.maxmemory_. This value can be overridden by individual
jobs. HADOOP-3581 implements a monitoring mechanism that kill tasks if they go
over their _maxmemory_ value. Keeping all this in mind, here's a proposal for
what we need to additionally do:
If _tasks.maxmemory_ is set, the TT monitors the total memory usage of all
tasks spawned by the TT. If this value goes over _tasks.maxmemory_, the TT
needs to kill one or more tasks. It first looks for tasks whose individual
memory is over their _default.maxmemory_ value. These are killed (while you may
ideally want to kill just enough that your total memory usage comes down, it's
not obvious which of these violators you choose to kill, so it's probably
simpler to kill all). If no such task is found, or if killing one or more of
these tasks still takes us over the memory limit, we need to pick other tasks
to kill. There are many ways to do this. Probably the easiest is to kill tasks
that ran most recently.
Tasks that are killed because they went over their memory limit should be
treated as failed, since they violated their contract. Tasks that are killed
because the sum total of memory usage was over a limit should be treated as
killed, since it's not really their fault.
Another improvement is to let _mapred.tasktracker.tasks.maxmemory_ be set by an
external script, which lets Ops control what this value should be. A slightly
less desirable option, as indicated in some offline discussions with Alan W, is
to set this value to be an absolute number ("hadoop may use X amount") or an
offset of the total amount of memory on the machine ("hadoop may use all but
4g").
> Enhance how memory-intensive user tasks are handled
> ---------------------------------------------------
>
> Key: HADOOP-4523
> URL: https://issues.apache.org/jira/browse/HADOOP-4523
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Vivek Ratan
>
> HADOOP-3581 monitors each Hadoop task to see if its memory usage (which
> includes usage of any tasks spawned by it and so on) is within a per-task
> limit. If the task's memory usage goes over its limit, the task is killed.
> This, by itself, is not enough to prevent badly behaving jobs from bringing
> down nodes. What is also needed is the ability to make sure that the sum
> total of VM usage of all Hadoop tasks does not exceed a certain limit.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.