[jira] Commented: (HADOOP-3759) Provide ability to run memory intensive jobs without affecting other running tasks on the nodes

Hemanth Yamijala (JIRA) Fri, 08 Aug 2008 06:22:38 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620928#action_12620928
 ]


Hemanth Yamijala commented on HADOOP-3759:
------------------------------------------

The attached file implements the proposal mentioned above.

Following is a summary of the changes:

- Defined two configuration variables: mapred.tasktracker.tasks.maxmemory and 
mapred.task.maxmemory. The first is described as MAX_MEM and the other as 
MAX_MEM_PER_TASK in the comments above. The default values are set to -1, 
turning them off.
- JobConf provides accessors for the above
- In TaskTrackerStatus defined a way to pass free memory and default max memory 
per task to the JobTracker from the TaskTracker. These are passed using a Map, 
so that other resource details we might want to pass in the future can be done 
without changing the protocol.
- In TaskTracker, implemented changes to compute the free memory. Also, when a 
job does not define mapred.task.maxmemory, the tasktracker sets this to MAX_MEM 
/ number of slots while localizing the task.

The patch contains some additional log statements that I will remove after the 
review is completed. Also, it is missing unit tests. Request a review of the 
code, except for these points.

Regarding tests, I've tested the changes manually. I am looking for some ideas 
on how to automate these. What would be ideal is to test the following: 
Configure the memory related variables, schedule tasks in a predetermined 
order, verify that each time, the free memory is computed correctly. The last 
part seems to require hooks into the heartbeat processing code on JT or TT. 
Alternatively, we can make the free memory computation package private. The 
latter seems to be very hacky. Any other ideas ?

> Provide ability to run memory intensive jobs without affecting other running 
> tasks on the nodes
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3759
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3759.patch, HADOOP-3759.patch
>
>
> In HADOOP-3581, we are discussing how to prevent memory intensive tasks from 
> affecting Hadoop daemons and other tasks running on a node. A related 
> requirement is that users be provided an ability to run jobs which are memory 
> intensive. The system must provide enough knobs to allow such jobs to be run 
> while still maintaining the requirements of HADOOP-3581.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3759) Provide ability to run memory intensive jobs without affecting other running tasks on the nodes

Reply via email to