[ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776896#action_12776896 ]
Vinod K V commented on MAPREDUCE-1119: -------------------------------------- We can avoid the uncertainity with dumping stack in case of Child exception by directly calling {{taskController.dumpTaskStack(context)}} from inside {{TaskTracker.markUnresponsiveTasks()}} immediately before {{purgeTask(tip,true)}}. This will create the dump only when it is absolutely needed. Would that work? To construct the context, you may need a bridging method inside {{JvmManager}} which can itself call {{taskController.dumpTaskStack(context)}}. That will make the patch a lot simpler, and will avoid many other avoidable changes to {{JmvManager}}/{{TaskController}}. If you wish, I can upload a demonstrating patch. Thoughts? > When tasks fail to report status, show tasks's stack dump before killing > ------------------------------------------------------------------------ > > Key: MAPREDUCE-1119 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: Aaron Kimball > Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, > MAPREDUCE-1119.4.patch, MAPREDUCE-1119.patch > > > When the TT kills tasks that haven't reported status, it should somehow > gather a stack dump for the task. This could be done either by sending a > SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to > gather the stack directly from Java. This may be somewhat tricky since the > child may be running as another user (so the SIGQUIT would have to go through > LinuxTaskController). This feature would make debugging these kinds of > failures much easier, especially if we could somehow get it into the > TaskDiagnostic message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.