[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-4955:
-------------------------------------

    Assignee: Haibo Chen

> NM container diagnostics for excess resource usage can be lost if task fails 
> while being killed 
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4955
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Haibo Chen
>
> When a nodemanager kills a container for being over resource budgets, it 
> provides a diagnostics message for the container status explaining why it was 
> killed.  However this message can be lost if the task fails during the 
> shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) 
> and notifies the AM via the task umbilical *before* the AM receives the NM's 
> container status message via the RM heartbeat.
> In that case the task attempt fails with the task's failure diagnostic, and 
> the user is left wondering exactly why the task failed because the NM's 
> diagnostics arrive too late, are not written to the history file, and are 
> lost.  If the AM receives the container status via the RM heartbeat before 
> the task fails during shutdown then the diagnostics are written properly to 
> the history file, and the user can see why the task failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to