[
https://issues.apache.org/jira/browse/HADOOP-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860088#comment-16860088
]
Steve Loughran commented on HADOOP-16357:
-----------------------------------------
I'm movings this to hadoop-common and pointing the blame at the staging
committer: it doesn't need to be checking the destination directory.
The normal FileOutputCommitter and the Magic committer both just create the
directory and don't worry about the rest.
the staging committer will fail if the dest dir is there, because the default
conflict mode is "fail", For the others it is implicitly, "append"
Proposed
* change the default mode to append, so as to be consistent.
* update the docs
Note: if you are using this in spark, spark itself fails fast if the dest dir
exists, so to actually have append there you need to tell the spark.write
operation to overwrite. Moot for Terasort.
> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> --------------------------------------------------------
>
> Key: HADOOP-16357
> URL: https://issues.apache.org/jira/browse/HADOOP-16357
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.1.0, 3.2.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Minor
> Attachments: MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath
> and writes partition filename but DirectoryStagingCommitter expects output
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with
> state FAILED due to: Job setup failed :
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job
> as Task committer attempt_1559891760159_0011_m_000000_0: Destination path
> exists and committer conflict resolution mode is "fail"
> at
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
> at
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]