[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774910#action_12774910
 ] 

Vinod K V commented on MAPREDUCE-1119:
--------------------------------------

bq. This currently causes stack traces for all killed tasks, right? I don't 
personally have a problem with that, but the description of the JIRA indicates 
that only those due to failing to report status will dump their stack, and it's 
worth noting the difference.
We shouldn't be doing this. Dump is needed only for tasks that TaskTracker 
forcefully kills them when it thinks that the task in question is hung.

bq. in destroyTaskJVM, you sleep for sleeptimeBeforeSigkill in between the 
SIGQUIT and the SIGKILL. This seems wrong - it should probably be a different 
timeout.
That timeout was added so that the process can do whatever cleanup it wishes to 
do before a SIGKILL arrives. So I think, it's fine to use the same 
configuration. But I think we should just skip SIGTERM and go directly to 
SIGKILL after SIGQUIT. We know for sure the task is hung and we are forcibly 
killing it, so skipping SIGTERM in this case will avoid further waiting before 
SIGKILL.

bq. I'd prefer SIGQUIT_TASK_JVM rather than QUIT_TASK_JVM for clarity's sake. 
It's a little less consistent, but more obvious for people reading the code 
later on.
+1. Either that or CORE_DUMP_TASK_JVM. No bias though.

bq. LinuxTaskController.finishTask is now sort of a misnomer, since you're 
using it to send SIGQUIT. Maybe rename to sendKillSignal or something?
+1

I have one more point. We need dump of the task JVM itsellf, and (I think) not 
of the child processes of the JVM(?) So no sigQuitProcessGroup()? Thoughts?

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow 
> gather a stack dump for the task. This could be done either by sending a 
> SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to 
> gather the stack directly from Java. This may be somewhat tricky since the 
> child may be running as another user (so the SIGQUIT would have to go through 
> LinuxTaskController). This feature would make debugging these kinds of 
> failures much easier, especially if we could somehow get it into the 
> TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to