[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876129#comment-14876129 ]
Hudson commented on MAPREDUCE-6478: ----------------------------------- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #411 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/411/]) MAPREDUCE-6478. Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob. (Junping Du via wangda) (wangda: rev 372ad270a0d7ea5c581cd9a42b3c3cb189eca204) * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > ------------------------------------------------------------------------------------ > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Junping Du > Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6478-v1.1.patch, MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)