[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876373#comment-14876373
 ] 

Hudson commented on MAPREDUCE-6478:
-----------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #392 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/392/])
MAPREDUCE-6478. Add an option to skip cleanupJob stage or ignore cleanup 
failure during commitJob. (Junping Du via wangda) (wangda: rev 
372ad270a0d7ea5c581cd9a42b3c3cb189eca204)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
* hadoop-yarn-project/CHANGES.txt


> Add an option to skip cleanupJob stage or ignore cleanup failure during 
> commitJob().
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6478
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Junping Du
>            Assignee: Junping Du
>             Fix For: 2.8.0
>
>         Attachments: MAPREDUCE-6478-v1.1.patch, MAPREDUCE-6478-v1.patch
>
>
> In some of our test cases for MR on public cloud scenario, a very big MR job 
> with hundreds or thousands of reducers cannot finish successfully because of 
> Job Cleanup failures which is caused by different scale/performance impact 
> for File System on the cloud (like AzureFS) which replacing HDFS's deletion 
> for whole directory with REST API calls on deleting each sub-directories 
> recursively. Even it get successfully, that could take much longer time 
> (hours) which is not necessary and waste time/resources especially in public 
> cloud scenario. 
> In these scenarios, some failures of cleanupJob can be ignored or user choose 
> to skip cleanupJob() completely make more sense. This is because making whole 
> job finish successfully with side effect of wasting some user spaces is much 
> better as user's jobs are usually comes and goes in public cloud, so have 
> choices to tolerant some temporary files exists with get rid of big job 
> re-run (or saving job's running time) is quite effective in time/resource 
> cost. 
> We should allow user to have this option (ignore failure or skip job cleanup 
> stage completely) especially when user know the cleanup failure is not due to 
> HDFS abnormal status but other FS' different performance trade-off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to