ConfX created MAPREDUCE-7448:
--------------------------------
Summary: Inconsistent Behavior for FileOutputCommitter V1 to
commit successfully many times
Key: MAPREDUCE-7448
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7448
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: ConfX
Attachments: reproduce.sh
h2. What happened
I turned on {{mapreduce.fileoutputcommitter.cleanup.skipped=true}} and then the
version 1 of {{FileOutputCommitter}} can commit several times, which is
unexpected.
h2. Where's the problem
In {{{}FileOutputCommitter.commitJobInternal{}}},
{noformat}
if (algorithmVersion == 1) {
for (FileStatus stat: getAllCommittedTaskPaths(context)) {
mergePaths(fs, stat, finalOutput, context);
}
} if (skipCleanup) {
LOG.info("Skip cleanup the _temporary folders under job's output " +
"directory in commitJob.");
...{noformat}
Here if we skip cleanup, the _temporary folder would not be deleted and the
_SUCCESS file would also not be created, which cause the {{mergePaths}} next
time to not fail.
h2. How to reproduce
# set {{{}mapreduce.fileoutputcommitter.cleanup.skipped{}}}={{{}true{}}}
# run
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testCommitterWithDuplicatedCommitV1}}
you should observe
{noformat}
java.lang.AssertionError: Duplicate commit successful: wrong behavior for
version 1.
at org.junit.Assert.fail(Assert.java:89)
at
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitInternal(TestFileOutputCommitter.java:295)
at
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitV1(TestFileOutputCommitter.java:269){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.
We are happy to provide a patch if this issue is confirmed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]