[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144582#comment-15144582
 ] 

Eric Payne commented on MAPREDUCE-5044:
---------------------------------------

Hi [~jira.shegalov]. I would like to see this functionality implemented. We 
occasionally see containers time out, and it would be good if users could have 
direct feedback in the form of a jstack to help them debug their applications.

I have been coming up to speed on the work that's already been committed in 
this area under YARN-445 and its children. IIUC, YARN-445 and its children put 
in place the infrastructure for a {{Client -> RM -> NM -> Container}} signal 
path. On the other hand, this JIRA (along with YARN-1515) implements an {{AM -> 
NM -> Container}} signal path and the ability to send multiple signals per call.

It seems that these pieces could possibly be split into separate JIRAs. Either 
way, I think that a lot of what has been done in this JIRA could be used to add 
the interface to {{ContainerManagementProtocol}} that would allow the AM to 
prompt the NM to signal the container to dump its stack prior to killing the 
container on a timeout.

Is there a possibility that this JIRA will move forward? Ideally, we would like 
it all ported back to 2.7. Please let me know if there's anything I can do.

> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen 
> Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to