[
https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591455#comment-13591455
]
Jason Lowe commented on MAPREDUCE-4993:
---------------------------------------
I'm not exactly sure what happened in this case, as I'm just documenting the
poor error handling by the AM on a job I was asked to analyze. From the
stacktrace it looks like the AM was trying to setup the common portion of the
task launch contexts and encountered an IOException while processing
distributed cache files because they were deleted. Maybe someone submitted a
job whose distributed cache files in HDFS were deleted while the job was still
in-flight?
Anyway the problem is, as you point out, that the AM is not properly handling
exceptions while setting up the common container launch context for tasks. If
an error occurs while setting that up, it should fail the job with the job
diagnostics indicating the exception message and stacktrace rather than simply
exiting with no diagnostics.
> AM thinks it was killed when an error occurs setting up a task container
> launch context
> ---------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4993
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Abhishek Kapoor
>
> If an IOException occurs while setting up a container launch context for a
> task then the AM exits with a KILLED status and no diagnostics. The job
> should be marked as FAILED (or maybe ERROR) with a useful diagnostics message
> indicating the nature of the error.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira