[
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221178#comment-16221178
]
Haibo Chen commented on MAPREDUCE-6984:
---------------------------------------
Thanks [~grepas] for the patch! A few high level comments:
1) The cleanup of previous attempts does not impact the current attempt in any
case, so I think we should catch
all exception in cleanUpPreviousAttemptOutput() instead of just IOExceptions.
2) If we have multiple previous attempts and the previous attempt cleaned up
its previous attempts, we'd get
FileNotFoundException while doing cleanUp in current attempt, resulting in
bogus error messages
"LOG.error("Error while trying to clean up previous attempt (" + appAttemptId +
")", e);"
We could catch FileNotFoundException and ignore it first, and then log any
exception like this.
{code}
private void cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId) {
Configuration configuration = new Configuration(getConfig());
configuration.setInt(MRJobConfig.APPLICATION_ATTEMPT_ID,
appAttemptId.getAttemptId());
JobContext jobContext = getJobContextFromConf(configuration);
try {
LOG.info("Starting to clean up previous attempt's (" +
appAttemptId + ") temporary files");
OutputCommitter outputCommitter = createOutputCommitter(configuration);
outputCommitter.abortJob(jobContext, State.KILLED);
LOG.info("Finished cleaning up previous attempt's (" +
appAttemptId + ") temporary files");
} catch (FileNotFoundException) {
//safely ignore
}catch (Exception e) {
// the clean up of a previous attempt is not critical to the success
// of this job - only logging the error
LOG.error("Error while trying to clean up previous attempt (" +
appAttemptId + ")", e);
}
}
{code}
3) We only recover successful tasks, so we can also do the clean up of
failed/killed tasks even recovery is enabled. but that's another optimization.
I am ok with not doing it if it is complicated.
> MR AM to clean up temporary files from previous attempt
> -------------------------------------------------------
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: applicationmaster
> Affects Versions: 3.0.0-beta1
> Reporter: Gergo Repas
> Assignee: Gergo Repas
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch
>
>
> When the MR AM restarts, the
> {outputDir}/_temporary/{appAttemptNumber} directory
> remains on HDFS, even though this directory is not used during the next
> attempt if the restart has been done without recovery. So if recovery is not
> used for the AM restart, then the deletion of this directory can be done
> earlier (at the start of the next attempt). The benefit is that more free
> HDFS space is available for the next attempt.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]