[jira] Commented: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes

Vinod Kumar Vavilapalli (JIRA) Fri, 11 Jul 2008 04:05:31 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612833#action_12612833
 ]


Vinod Kumar Vavilapalli commented on HADOOP-3581:
-------------------------------------------------

After discussing about this with Hemanth, it came out that we need to 
reorganize the current code. I propose the following;

TaskTracker maintains a ProcessTree object for each task.

{code}
public abstract class ProcessTree {

 /* Initialize the process tree */
 public void initialize();

 /* Destroy the process tree */
 public void destroy();

 /* Return total virtual memory usage by this process tree */
 public long getVmem();

}
{code}

My previous code(ProcessTracker: initialize, kill and getCurrentVmemUsage) 
would be moved to a class LinuxProcessTree that extends ProcessTree. Cygwin 
seems to support proc filesystem - so the same can be used for windows , need 
to confirm this for sure, though. For solaris/other OS we need classes 
extending ProcessTree.

Getting pid of the task process : In the first patch, the implementation of 
getting pid of a process is hacky - put forward here 
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244896), it tries to get 
the value of a private variable by suppressing the standard Java language 
access checks. This won't work if it is prevented by security policy.
Replacing this implementation with a better one Instead, we should let 
TaskTracker ask Task via TaskUmbilicalProtocol to give its pid through a getPid 
call:
{code}
public Integer getPid();
{code}
The task itself should return its pid(pid on *NIX and cygwin PIDs on Windows) 
to TT perhaps by calling native code.

Side notes:
 - Should we reconstruct the process tree on demand when a getVmem() is called, 
or should we start a thread and update it periodically?
 - If cygwin also supports process groups and sessions (I can see setpgid, 
setsid etc. in cygwin's POSIX compatible API here 
http://cygwin.com/cygwin-api/compatibility.html#std-susv3 ), we might want to 
change the implementation of destroy. This is to be a separate issue where we 
also modify how we start tasks.


> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
>                 Key: HADOOP-3581
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3581
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: patch_3581_0.1.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive, 
> maybe due to some inadvertent bugs in the user code, or the amount of data 
> processed. When this happens, the user tasks start to interfere with the 
> proper execution of other processes on the node, including other Hadoop 
> daemons like the DataNode and TaskTracker. Thus, the node would become 
> unusable for any Hadoop tasks. There should be a way to prevent such tasks 
> from bringing down the node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes

Reply via email to