[
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612833#action_12612833
]
Vinod Kumar Vavilapalli commented on HADOOP-3581:
-------------------------------------------------
After discussing about this with Hemanth, it came out that we need to
reorganize the current code. I propose the following;
TaskTracker maintains a ProcessTree object for each task.
{code}
public abstract class ProcessTree {
/* Initialize the process tree */
public void initialize();
/* Destroy the process tree */
public void destroy();
/* Return total virtual memory usage by this process tree */
public long getVmem();
}
{code}
My previous code(ProcessTracker: initialize, kill and getCurrentVmemUsage)
would be moved to a class LinuxProcessTree that extends ProcessTree. Cygwin
seems to support proc filesystem - so the same can be used for windows , need
to confirm this for sure, though. For solaris/other OS we need classes
extending ProcessTree.
Getting pid of the task process : In the first patch, the implementation of
getting pid of a process is hacky - put forward here
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244896), it tries to get
the value of a private variable by suppressing the standard Java language
access checks. This won't work if it is prevented by security policy.
Replacing this implementation with a better one Instead, we should let
TaskTracker ask Task via TaskUmbilicalProtocol to give its pid through a getPid
call:
{code}
public Integer getPid();
{code}
The task itself should return its pid(pid on *NIX and cygwin PIDs on Windows)
to TT perhaps by calling native code.
Side notes:
- Should we reconstruct the process tree on demand when a getVmem() is called,
or should we start a thread and update it periodically?
- If cygwin also supports process groups and sessions (I can see setpgid,
setsid etc. in cygwin's POSIX compatible API here
http://cygwin.com/cygwin-api/compatibility.html#std-susv3 ), we might want to
change the implementation of destroy. This is to be a separate issue where we
also modify how we start tasks.
> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
> Key: HADOOP-3581
> URL: https://issues.apache.org/jira/browse/HADOOP-3581
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Hemanth Yamijala
> Assignee: Vinod Kumar Vavilapalli
> Attachments: patch_3581_0.1.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive,
> maybe due to some inadvertent bugs in the user code, or the amount of data
> processed. When this happens, the user tasks start to interfere with the
> proper execution of other processes on the node, including other Hadoop
> daemons like the DataNode and TaskTracker. Thus, the node would become
> unusable for any Hadoop tasks. There should be a way to prevent such tasks
> from bringing down the node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.