[ https://issues.apache.org/jira/browse/MAPREDUCE-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated MAPREDUCE-7448: -------------------------------------- Labels: pull-request-available (was: ) > Inconsistent Behavior for FileOutputCommitter V1 to commit successfully many > times > ---------------------------------------------------------------------------------- > > Key: MAPREDUCE-7448 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7448 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: ConfX > Priority: Critical > Labels: pull-request-available > Attachments: reproduce.sh > > > h2. What happened > I turned on {{mapreduce.fileoutputcommitter.cleanup.skipped=true}} and then > the version 1 of {{FileOutputCommitter}} can commit several times, which is > unexpected. > h2. Where's the problem > In {{{}FileOutputCommitter.commitJobInternal{}}}, > {noformat} > if (algorithmVersion == 1) { > for (FileStatus stat: getAllCommittedTaskPaths(context)) { > mergePaths(fs, stat, finalOutput, context); > } > } if (skipCleanup) { > LOG.info("Skip cleanup the _temporary folders under job's output " + > "directory in commitJob."); > ...{noformat} > Here if we skip cleanup, the _temporary folder would not be deleted and the > _SUCCESS file would also not be created, which cause the {{mergePaths}} next > time to not fail. > h2. How to reproduce > # set {{{}mapreduce.fileoutputcommitter.cleanup.skipped{}}}={{{}true{}}} > # run > {{org.apache.hadoop.mapred.TestFileOutputCommitter#testCommitterWithDuplicatedCommitV1}} > you should observe > {noformat} > java.lang.AssertionError: Duplicate commit successful: wrong behavior for > version 1. > at org.junit.Assert.fail(Assert.java:89) > at > org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitInternal(TestFileOutputCommitter.java:295) > at > org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitV1(TestFileOutputCommitter.java:269){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org