[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

Gera Shegalov (JIRA) Mon, 24 Feb 2014 12:03:48 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gera Shegalov updated MAPREDUCE-5044:
-------------------------------------

    Attachment: MAPREDUCE-5044.v04.patch

v04 to apply on top of YARN-1515.v05. It now makes sure that a thread dump is 
created in the uber mode. 

Added unit tests for a normal MR job and uber MR job.

While working on this I realized that we actually need to discuss how 
mapreduce.task.timeout is treated in the ubermode. Right now it's basically 
ignored because AM does not kill itself, LocalContainerLauncher processes 
CONTAINER_REMOTE_CLEANUP inline with the stuck in SubtaskRunner.  The liveness 
monitor for AM in RM does not catch the problem either because RMCommunicator 
heartbeats in a separate allocator thread. 

I am considering two options:
- move heartbeat() into SubtaskRunner for ubermode such that the liveness 
monitor catches the stuck ubertask.
- do System.exit(errorcode) when TA_TIMEOUT occurs.

 

> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, Screen Shot 2013-11-12 at 
> 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

Reply via email to