[jira] [Updated] (MAPREDUCE-6771) RMContainerAlllocator sends container diagnostics event after corresponding completion event

Jason Lowe (JIRA) Wed, 31 Aug 2016 13:18:30 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated MAPREDUCE-6771:
----------------------------------
    Summary: RMContainerAlllocator sends container diagnostics event after 
corresponding completion event  (was: Diagnostics information can be lost in 
.jhist if task containers are killed by Node Manager.)

Yep, let's get an incremental improvement while the complete solution is being 
thought out.  I only brought up MAPREDUCE-4955 because the original summary 
implied this would fix all cases of the container diagnostic being dropped.  
Updating the headline to reflect the reduction in scope.

> RMContainerAlllocator sends container diagnostics event after corresponding 
> completion event
> --------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6771
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.7.3
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (MAPREDUCE-6771) RMContainerAlllocator sends container diagnostics event after corresponding completion event

Reply via email to