[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma resolved MAPREDUCE-6135.
--------------------------------
    Resolution: Duplicate

Thanks, Jason. Resolve this as dup. Will continue the discussion over at 
MAPREDUCE-5502. It looks like Robert in MAPREDUCE-4428 also mentioned the 
approach of rerun AM for cleanup.

> Job staging directory remains if MRAppMaster is OOM
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-6135
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6135
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Ming Ma
>
> If MRAppMaster attempts run out of memory, it won't go through the normal job 
> clean up process to move history files to history server location. When 
> customers try to find out why the job failed, the data won't be available on 
> history server webUI.
> The work around is to extract the container id and NM id from the jhist file 
> in the job staging directory; then use "yarn logs" command to get the AM logs.
> It would be great the platform can take care of it by moving these hist files 
> automatically to history server if AM attempts don't exit properly.
> We discuss ideas on how to address this and would like get suggestions from 
> others. Not sure if timeline server design covers this scenario.
> 1. Define some protocol for YARN to tell AppMaster "you have exceeded AM max 
> attempt, please clean up". For example, YARN can launch AppMaster one more 
> time after AM max attempt and MRAppMaster use that as the indication this is 
> clean-up-only attempt.
> 2. Have some program periodically check job statuses and move files from job 
> staging directory to history server for those finished jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to