[ 
https://issues.apache.org/jira/browse/HADOOP-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860143#comment-16860143
 ] 

Steve Loughran commented on HADOOP-16357:
-----------------------------------------

also, -1 on the patch. Making the problem "go away" by changing terasort is not 
the solution, as it breaks the whole idea of "changing committer must be 
transparent"

Instead I'm going to change the default value. FWIW,  my initial patch does 
this but also adds a check for the dest path being a file, and fails fast for 
all committers if that holds. And that hits the fact that the mock tests are so 
brittle they all go on to fail at this point with the move from fs.exists() to 
fs.getFileStatus. That is: fixing the mock tests to catch up with a single API 
call change is more effort than any other part of the patch

> TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-16357
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16357
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Minor
>         Attachments: HADOOP-16357-001.patch, MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_000000_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>       at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>       at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>       at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>       at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to