[
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated MAPREDUCE-6771:
----------------------------------
Summary: RMContainerAlllocator sends container diagnostics event after
corresponding completion event (was: Diagnostics information can be lost in
.jhist if task containers are killed by Node Manager.)
Yep, let's get an incremental improvement while the complete solution is being
thought out. I only brought up MAPREDUCE-4955 because the original summary
implied this would fix all cases of the container diagnostic being dropped.
Updating the headline to reflect the reduction in scope.
> RMContainerAlllocator sends container diagnostics event after corresponding
> completion event
> --------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.7.3
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg,
> mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager.
> Then MR AM gets notified of the container status and diagnostics information
> through its heartbeat with RM. However, it is possible that the diagnostics
> information never gets into .jhist file, so when the job completes, the
> diagnostics information associated with the failed task attempts is empty.
> This makes it hard for users to root cause job failures that are often caused
> by memory leak.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]