[jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context

Jason Lowe (JIRA) Sat, 02 Mar 2013 09:37:18 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591455#comment-13591455
 ]


Jason Lowe commented on MAPREDUCE-4993:
---------------------------------------

I'm not exactly sure what happened in this case, as I'm just documenting the 
poor error handling by the AM on a job I was asked to analyze.  From the 
stacktrace it looks like the AM was trying to setup the common portion of the 
task launch contexts and encountered an IOException while processing 
distributed cache files because they were deleted.  Maybe someone submitted a 
job whose distributed cache files in HDFS were deleted while the job was still 
in-flight?

Anyway the problem is, as you point out, that the AM is not properly handling 
exceptions while setting up the common container launch context for tasks.  If 
an error occurs while setting that up, it should fail the job with the job 
diagnostics indicating the exception message and stacktrace rather than simply 
exiting with no diagnostics.
                
> AM thinks it was killed when an error occurs setting up a task container 
> launch context
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4993
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Abhishek Kapoor
>
> If an IOException occurs while setting up a container launch context for a 
> task then the AM exits with a KILLED status and no diagnostics.  The job 
> should be marked as FAILED (or maybe ERROR) with a useful diagnostics message 
> indicating the nature of the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context

Reply via email to