[
https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haibo Chen reassigned MAPREDUCE-4955:
-------------------------------------
Assignee: Haibo Chen
> NM container diagnostics for excess resource usage can be lost if task fails
> while being killed
> ------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Haibo Chen
>
> When a nodemanager kills a container for being over resource budgets, it
> provides a diagnostics message for the container status explaining why it was
> killed. However this message can be lost if the task fails during the
> shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed)
> and notifies the AM via the task umbilical *before* the AM receives the NM's
> container status message via the RM heartbeat.
> In that case the task attempt fails with the task's failure diagnostic, and
> the user is left wondering exactly why the task failed because the NM's
> diagnostics arrive too late, are not written to the history file, and are
> lost. If the AM receives the container status via the RM heartbeat before
> the task fails during shutdown then the diagnostics are written properly to
> the history file, and the user can see why the task failed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]