[
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated HADOOP-3581:
--------------------------------------------
Attachment: patch_3581_0.1.txt
Attaching a first patch, so that it helps in moving the discussion forward. The
patch is still raw and needs a good deal of work up. Much of it is just a proof
of concept; enough abstraction is set in place so that actual implementation
can be changed easily.
At present,
- the process tracker works only on linux, uses proc file system and the
process directories inside.
- uses mapred.child.ulimit to limit the *total* vmem usage of all the tasks'
process trees.
- once it detects that the *total* vmem usage of all tasks has crossed over the
specified limit, it calls findOOMTaskstoKill to find tasks to be killed.
- findOOMTaskstoKill returns the list of tasks to be killed. Currently it
returns only one task, the one with the highest memory usage.
- after getting the list of tasks to be killed, it kills each of the
corresponding process trees by issuing individual 'kill <pid>' commands
(SIGTERM).
Need thought/TODO:
- Introduce separate configuration properties for usage of map tasks and reduce
tasks? Knock out previous usage of mapred.child.ulimit and its corresponding
usage to set ulimits?
- May want to monitor if the kill went through or not, and then issue a
subsequent SIGKILL as needed. Kill mechanism might totally change if we wish to
start the tasks using job control.
- May want to refactor the code a bit and merge killOOMTasks with
killOverflowingTasks. Later move all of this together to a single place when
HADOOP-3675 goes in.
- Lot of code paths are not synchronized yet, so might result in threading
errors/race conditions.
We still need decision as to whether we want to 1) limit aggregate usage over
all tasks' process trees or 2) limit usage per task's process tree. Believe
that both of these can be implemented with the framework setup in current patch.
> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
> Key: HADOOP-3581
> URL: https://issues.apache.org/jira/browse/HADOOP-3581
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Hemanth Yamijala
> Assignee: Vinod Kumar Vavilapalli
> Attachments: patch_3581_0.1.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive,
> maybe due to some inadvertent bugs in the user code, or the amount of data
> processed. When this happens, the user tasks start to interfere with the
> proper execution of other processes on the node, including other Hadoop
> daemons like the DataNode and TaskTracker. Thus, the node would become
> unusable for any Hadoop tasks. There should be a way to prevent such tasks
> from bringing down the node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.