[ 
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-3581:
--------------------------------------------

    Attachment: patch_3581_0.1.txt

Attaching a first patch, so that it helps in moving the discussion forward. The 
patch is still raw and needs a good deal of work up. Much of it is just a proof 
of concept; enough abstraction is set in place so that actual implementation 
can be changed easily.

At present,
- the process tracker works only on linux, uses proc file system and the 
process directories inside.
- uses mapred.child.ulimit to limit the *total* vmem usage of all the tasks' 
process trees.
- once it detects that the *total* vmem usage of all tasks has crossed over the 
specified limit, it calls findOOMTaskstoKill to find tasks to be killed.
- findOOMTaskstoKill returns the list of tasks to be killed. Currently it 
returns only one task, the one with the highest memory usage.
- after getting the list of tasks to be killed, it kills each of the 
corresponding process trees by issuing individual 'kill <pid>' commands 
(SIGTERM).

Need thought/TODO:
- Introduce separate configuration properties for usage of map tasks and reduce 
tasks? Knock out previous usage of mapred.child.ulimit and its corresponding 
usage to set ulimits?
-  May want to monitor if the kill went through or not, and then issue a 
subsequent SIGKILL as needed. Kill mechanism might totally change if we wish to 
start the tasks using job control.
- May want to refactor the code a bit and merge killOOMTasks with 
killOverflowingTasks. Later move all of this together to a single place when 
HADOOP-3675 goes in.
- Lot of code paths are not synchronized yet, so might result in threading 
errors/race conditions.

We still need decision as to whether we want to 1) limit aggregate usage over 
all tasks' process trees or 2) limit usage per task's process tree. Believe 
that both of these can be implemented with the framework setup in current patch.

> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
>                 Key: HADOOP-3581
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3581
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: patch_3581_0.1.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive, 
> maybe due to some inadvertent bugs in the user code, or the amount of data 
> processed. When this happens, the user tasks start to interfere with the 
> proper execution of other processes on the node, including other Hadoop 
> daemons like the DataNode and TaskTracker. Thus, the node would become 
> unusable for any Hadoop tasks. There should be a way to prevent such tasks 
> from bringing down the node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to