[
https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Kimball updated MAPREDUCE-1119:
-------------------------------------
Attachment: MAPREDUCE-1119.3.patch
Attaching new patch to address code review issues.
* Renamed {{QUIT_TASK_JVM}} to {{SIGQUIT_TASK_JVM}}
* Task timeout causes {{SIGQUIT}}; other task kill events do not.
** modified the various calls in the call chain for task kill to pass along a
{{wasFailure}} bit
** modified all associated call-sites to forward along existing {{wasFailure}}
bit, or generate a new {{true}} or {{false}} as appropriate.
** modified TestJobKillAndFail to distinguish between job kill and task timeout
failure conditions and whether or not those deserved stack dumps.
* If a SIGQUIT is issued before a SIGKILL, SIGTERM is not.
* Refactored common code in ProcessTree
> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-1119
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch,
> MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow
> gather a stack dump for the task. This could be done either by sending a
> SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to
> gather the stack directly from Java. This may be somewhat tricky since the
> child may be running as another user (so the SIGQUIT would have to go through
> LinuxTaskController). This feature would make debugging these kinds of
> failures much easier, especially if we could somehow get it into the
> TaskDiagnostic message
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.