[
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607650#action_12607650
]
Vinod Kumar Vavilapalli commented on HADOOP-3581:
-------------------------------------------------
- Ulimits
Approach taken by HADOOP-3280. Set ulimit -v for the tasks launched. Doing
this would limit the virtual memory usable by the launched task process. All
the sub-processes would also inherit the same limit and so are capped by this
limit. But this limit can easily be circumvented by a rogue/badly written task
by repetitively forking sub-processes. Another mechanism that is discussed on
HADOOP-3280 - system wide per user limits via /etc/security/limits.conf, is
just another form of ulimit -v : it is ulimit -v simply configured system-wide
and kicks in when a user logs in (login shell) and thus applies to all the
subsequent sub-processes. This mechanism also fails for the same reasons of
repetitive forking.
As discussed on HADOOP-3280, long term we will likely run TaskTrackers as root
and setuid to the submitting user and at that point ulimits definitely *will
not* help in addressing this issue. Each task might be well within limits, but
tasks of different users might use up significant memory and affect running of
hadoop daemons.
- TaskTrackers tracking memory usage of tasks' process tree.
In this approach, TaskTracker tracks the usage of all the tasks and their
sub-processes (irrepective of which user runs which tasks). We have two options
here.
Option1:
Put a limit on the total memory used by all the children of TaskTrackers
(aggregate limit over all tasks). This way we can be sure that the running of
hadoop daemon is not affected. But it has the obvious disadvantage of tasks
intruding into each other, resulting in issues like one task using up all of
memory within resource limits and making other tasks fail.
Option 2:
Put individual cap on the memory usable by each task. Perhaps separate limits
for map tasks and reduce tasks.
TaskTracker tracks the resource usage of each task and its sub-process tree.
Once a task crosses over the configured limits, TaskTracker kills the task's
process tree and reports accordingly. We need implementation for different
platforms to make it portable. Currently, we can restrict ourselves to using
procfs on posix systems. But enough abstraction should be in place for
implementation on other platforms. e.g. using windows api.
In the presence of free slots (if at all), this choice has the disadvantage of
running tasks under utililzing memory (not a significant issue?)
Comments?
> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
> Key: HADOOP-3581
> URL: https://issues.apache.org/jira/browse/HADOOP-3581
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Hemanth Yamijala
> Assignee: Vinod Kumar Vavilapalli
>
> Sometimes user Map/Reduce applications can get extremely memory intensive,
> maybe due to some inadvertent bugs in the user code, or the amount of data
> processed. When this happens, the user tasks start to interfere with the
> proper execution of other processes on the node, including other Hadoop
> daemons like the DataNode and TaskTracker. Thus, the node would become
> unusable for any Hadoop tasks. There should be a way to prevent such tasks
> from bringing down the node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.