[
https://issues.apache.org/jira/browse/HADOOP-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616444#action_12616444
]
Vivek Ratan commented on HADOOP-3759:
-------------------------------------
Given the proposals in this Jira, and in HADOOP-3581, I wanted to summarize in
one place how this entire feature works. Most, if not all, of this summary is
spread out across the two Jiras. I thought it would help to consolidate it in
one place.
The goal is to allow memory intensive jobs to run without affecting other jobs
and also detecting/killing which jobs are violating their memory contract with
Hadoop. Here is how we propose to do this:
# Each machine can be configured to set a maximum VM limit per task (and its
descendants). This limit, let's call it MAX_MEM, is specified by the config
variable _mapred.tasktracker.tasks.maxmemory_ and specifies the total VM
available on a machine to all TT tasks. By default, each task's maximum limit,
call it MAX_MEM_PER_TASK is MAX_MEM divided by the number of slots that the TT
is configured for. For example, if _mapred.tasktracker.tasks.maxmemory_ is set
to to 12GB, and the TT is configured for 2 Maps and 2 Reduce slots,
MAX_MEM_PER_TASK is 3GB, i.e., no single task (and its descendants) should go
over 3GB.
#* for simplicity, we assume that Maps and Reduce tasks are treated
equivalently. If we need to distinguish them, then we will have separate sets
of variables for Maps and Reduce tasks.
#* MAX_MEM may have different values on different machines.
#* MAX_MEM is optional (see
[here|https://issues.apache.org/jira/browse/HADOOP-3581?focusedCommentId=12615679#action_12615679]),
so it's possible to set up a cluster with no memory limits per task.
# The TT will detect if a task is using memory above MAX_MEM_PER_TASK and kill
it. This approach is described in HADOOP-3581.
# We'd like users to be able to run memory-intensive jobs, and thus to control
MAX_MEM_PER_TASK for tasks in their job. User can, optionally, specify a
per-task memory limit for their job (this limit applies to each task of the
job). As described
[here|https://issues.apache.org/jira/browse/HADOOP-3581?focusedCommentId=12614295#action_12614295],
we may have separate limits for map and reduce tasks, or just one limit.
# Given a task to run, the TT knows the MAX_MEM_PER_TASK for that task (which
is either a user-specified limit for that job, or a fraction of MAX_MEM, or no
limit at all).
# There is a scheduling component to all this, as described
[here|https://issues.apache.org/jira/browse/HADOOP-3759?focusedCommentId=12613663#action_12613663].
A scheduler may choose to support memory-intensive jobs in different ways.
#* If a scheduler ignores a user-specified limit, it may end up assigning a
task to a TT that has less VM than what the task asked for. This is no worse
than what we have today, but we may still see problems with memory intensive
tasks bringing down a system.
#* The scheduler in HADOOP-3445 will support memory limits and will assign
tasks to TTs only if there's enough VM available. However, tasks with higher
memory limits may take a little longer to be scheduled (this can be discussed
in more detail in HADOOP-3445).
> Provide ability to run memory intensive jobs without affecting other running
> tasks on the nodes
> -----------------------------------------------------------------------------------------------
>
> Key: HADOOP-3759
> URL: https://issues.apache.org/jira/browse/HADOOP-3759
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Fix For: 0.19.0
>
> Attachments: HADOOP-3759.patch
>
>
> In HADOOP-3581, we are discussing how to prevent memory intensive tasks from
> affecting Hadoop daemons and other tasks running on a node. A related
> requirement is that users be provided an ability to run jobs which are memory
> intensive. The system must provide enough knobs to allow such jobs to be run
> while still maintaining the requirements of HADOOP-3581.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.