[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1119:
-------------------------------------

    Attachment: MAPREDUCE-1119.6.patch

Attaching a new patch. This includes the above code review suggestions. The 
{{DefaultTaskController}} is tested by adding a subclass of TaskController for 
testing; this counts the number of times a {{dumpTaskStack()}} call is made, 
and ensures that it is incremented only during the appropriate jobs. The same 
strategy is employed for testing {{LinuxTaskController}}; 
{{ClusterWithLinuxTaskController.MyLinuxTaskController}} now counts SIGQUIT 
calls as well as any exceptional exit statuses from {{task-controller}} when 
administering the SIGQUIT to the client. Also improved 
{{ClusterWithLinuxTaskController}}'s documentation as regards setting up the 
testcase a bit.

All of these tests pass on my local machine. 

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, 
> MAPREDUCE-1119.4.patch, MAPREDUCE-1119.5.patch, MAPREDUCE-1119.6.patch, 
> MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow 
> gather a stack dump for the task. This could be done either by sending a 
> SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to 
> gather the stack directly from Java. This may be somewhat tricky since the 
> child may be running as another user (so the SIGQUIT would have to go through 
> LinuxTaskController). This feature would make debugging these kinds of 
> failures much easier, especially if we could somehow get it into the 
> TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to