[jira] Commented: (HADOOP-3759) Provide ability to run memory intensive jobs without affecting other running tasks on the nodes

Vivek Ratan (JIRA) Thu, 24 Jul 2008 04:07:55 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616444#action_12616444
 ]


Vivek Ratan commented on HADOOP-3759:
-------------------------------------

Given the proposals in this Jira, and in HADOOP-3581, I wanted to summarize in 
one place how this entire feature works. Most, if not all, of this summary is 
spread out across the two Jiras. I thought it would help to consolidate it in 
one place. 

The goal is to allow memory intensive jobs to run without affecting other jobs 
and also detecting/killing which jobs are violating their memory contract with 
Hadoop. Here is how we propose to do this: 
# Each machine can be configured to set a maximum VM limit per task (and its 
descendants). This limit, let's call it MAX_MEM, is specified by the config 
variable _mapred.tasktracker.tasks.maxmemory_ and specifies the total VM 
available on a machine to all TT tasks. By default, each task's maximum limit, 
call it MAX_MEM_PER_TASK is MAX_MEM divided by the number of slots that the TT 
is configured for. For example, if _mapred.tasktracker.tasks.maxmemory_ is set 
to to 12GB, and the TT is configured for 2 Maps and 2 Reduce slots, 
MAX_MEM_PER_TASK is 3GB, i.e., no single task (and its descendants) should go 
over 3GB. 
#* for simplicity, we assume that Maps and Reduce tasks are treated 
equivalently. If we need to distinguish them, then we will have separate sets 
of variables for Maps and Reduce tasks. 
#* MAX_MEM may have different values on different machines.
#* MAX_MEM is optional (see 
[here|https://issues.apache.org/jira/browse/HADOOP-3581?focusedCommentId=12615679#action_12615679]),
 so it's possible to set up a cluster with no memory limits per task. 
# The TT will detect if a task is using memory above MAX_MEM_PER_TASK and kill 
it. This approach is described in HADOOP-3581. 
# We'd like users to be able to run memory-intensive jobs, and thus to control 
MAX_MEM_PER_TASK for tasks in their job. User can, optionally, specify  a 
per-task memory limit for their job (this limit applies to each task of the 
job). As described 
[here|https://issues.apache.org/jira/browse/HADOOP-3581?focusedCommentId=12614295#action_12614295],
 we may have separate limits for map and reduce tasks, or just one limit. 
# Given a task to run, the TT knows the MAX_MEM_PER_TASK for that task (which 
is either a user-specified limit for that job, or a fraction of MAX_MEM, or no 
limit at all). 
# There is a scheduling component to all this, as described 
[here|https://issues.apache.org/jira/browse/HADOOP-3759?focusedCommentId=12613663#action_12613663].
 A scheduler may choose to support memory-intensive jobs in different ways. 
#* If a scheduler ignores a user-specified limit, it may end up assigning a 
task to a TT that has less VM than what the task asked for. This is no worse 
than what we have today, but we may still see problems with memory intensive 
tasks bringing down a system. 
#* The scheduler in HADOOP-3445 will support memory limits and will assign 
tasks to TTs only if there's enough VM available. However, tasks with higher 
memory limits may take a little longer to be scheduled (this can be discussed 
in more detail in HADOOP-3445). 


> Provide ability to run memory intensive jobs without affecting other running 
> tasks on the nodes
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3759
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3759.patch
>
>
> In HADOOP-3581, we are discussing how to prevent memory intensive tasks from 
> affecting Hadoop daemons and other tasks running on a node. A related 
> requirement is that users be provided an ability to run jobs which are memory 
> intensive. The system must provide enough knobs to allow such jobs to be run 
> while still maintaining the requirements of HADOOP-3581.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3759) Provide ability to run memory intensive jobs without affecting other running tasks on the nodes

Reply via email to