[
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446112#comment-15446112
]
Jason Lowe commented on MAPREDUCE-6771:
---------------------------------------
Thanks for the report and patch! This is closely related to MAPREDUCE-4955.
I think this is necessary but not sufficient to fix the problem. As noted in
MAPREDUCE-4955 the task can fail due to the SIGTERM and report that failure via
the umbilical before the container completion event arrives at the AM. At that
point the task attempt is already dead from the AM perspective and the .jhist
entry already recorded, so the extra diagnostics have nowhere to go. The AM
would either need to postpone recording the attempt completion event until it
receives the container completion event to see if there are any diagnostics or
there needs to be a way to record postmortem diagnostics for attempts in the
jhist file.
> Diagnostics information can be lost in .jhist if task containers are killed
> by Node Manager.
> --------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.7.3
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Attachments: mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager.
> Then MR AM gets notified of the container status and diagnostics information
> through its heartbeat with RM. However, it is possible that the diagnostics
> information never gets into .jhist file, so when the job completes, the
> diagnostics information associated with the failed task attempts is empty.
> This makes it hard for users to root cause job failures that are often caused
> by memory leak.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]