[jira] Updated: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes

Vinod Kumar Vavilapalli (JIRA) Thu, 21 Aug 2008 05:15:08 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinod Kumar Vavilapalli updated HADOOP-3581:
--------------------------------------------

    Attachment: HADOOP-3581.6.0.txt

Attaching a new patch. This one assumes HADOOP-3759 is in. Even after that, it 
needs minor merges at two places. Made a few changes also.
 - We used to start the TaskMemoryManagerThread all the time, which will run 
and get disabled if memory management is not abled. Now, changed this behaviour 
so that this thread is created only if required.

 - In earlier patch, we used to send SIGTERM first, sleep for some interval, 
then send a SIGKILL. This had the problem of memory-overstepping tasks sneak in 
and finish off during that wait time. Now that is prevented by moving 
{sleeping, sending SIGKILL} to a new thread(SigKillThread). This way, we 'll 
have one thread per process-tree to be killed and the number of threads is 
bounded by the total number of tasks that can be run on a TT. This is nearly 
fool proof, only tasks that can run with over-quota memory are the ones that 
have very very short life span of 300ms(TaskMemoryManager sleep time). Very 
unlikely.

 - Added another testcase TestTaskTrackerMemoryManager which uses miniMR and 
miniDFS clusters, It adds two tests to test that 1) tasks with memory 
requirements within what TT can offer will run successfully without any errors 
and 2) tasks with memory requirements more than what TTs can offer are really 
killed. Also asserts that the error messages in diagnostic information are as 
expected and in expected format.

 - This patch also makes the tips killed due to memory transgression to be 
marked as FAILED. Earlier they were marked as KILLED, TIPS kept getting 
rescheduled and so the job could go on for ever without finishing.

> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
>                 Key: HADOOP-3581
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3581
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: HADOOP-3581.6.0.txt, patch_3581_0.1.txt, 
> patch_3581_3.3.txt, patch_3581_4.3.txt, patch_3581_4.4.txt, 
> patch_3581_5.0.txt, patch_3581_5.2.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive, 
> maybe due to some inadvertent bugs in the user code, or the amount of data 
> processed. When this happens, the user tasks start to interfere with the 
> proper execution of other processes on the node, including other Hadoop 
> daemons like the DataNode and TaskTracker. Thus, the node would become 
> unusable for any Hadoop tasks. There should be a way to prevent such tasks 
> from bringing down the node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes

Reply via email to