[ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227367#comment-16227367
 ] 

Steve Loughran commented on HADOOP-14971:
-----------------------------------------

Patch 041. (rebased to trunk; hopefully that will have merge taking)

* retry rollback was incomplete.; TestInvoker was failing. Fixed.
    
Error handling to make cause of failure on commit conflict (especially 
partitioned commit conflict) more meaningful.
* A new error messsage  "Destination path exists and committer conflict 
resolution mode is FAIL"  in InternalCommitterConstants
* Used in DirectoryCommitter job setup/commit, and PartitionedCommitter task 
commit
* Coverage in troubleshooting section in committer.md
This required making publiic the heretofore protected PathExistsException(path, 
message) constructor.

* Removed restriction on staging committer rejecting S3A path in cluster for 
temp dir. If you have s3guard on or a consistent store, it should actually 
work. Ewan : enjoy!

This goes in sync with the latest test work I've done in my spark cloud 
integration tests, where I've been verifying the ability to specify both the 
committer and conflict mode from inside the spark workflow, by setting the 
options on the writer. These get passed to the job configuration, and where the 
s3a committer factory uses them to choose committer, the committers their 
conflicts. 

{code}
      val writer = source.write
      writer.partitionBy("year", "month")
      writer.mode(SaveMode.Append)
      writer.option(S3A_COMMITTER_NAME , committer)
      writer.option(CONFLICT_MODE, conflict)
      writer
        .format(format)
        .save(dest.toUri.toString)
{code}

The spark tests verify conflict behaviour, especially in partitioned commits. 
The error message is to help people identify the problem, so they don't see a 
task failure reported to the driver and interpret that as anything other than 
the successful operation of the named committer.

> Merge S3A committers into trunk
> -------------------------------
>
>                 Key: HADOOP-14971
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14971
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to