[
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227367#comment-16227367
]
Steve Loughran commented on HADOOP-14971:
-----------------------------------------
Patch 041. (rebased to trunk; hopefully that will have merge taking)
* retry rollback was incomplete.; TestInvoker was failing. Fixed.
Error handling to make cause of failure on commit conflict (especially
partitioned commit conflict) more meaningful.
* A new error messsage "Destination path exists and committer conflict
resolution mode is FAIL" in InternalCommitterConstants
* Used in DirectoryCommitter job setup/commit, and PartitionedCommitter task
commit
* Coverage in troubleshooting section in committer.md
This required making publiic the heretofore protected PathExistsException(path,
message) constructor.
* Removed restriction on staging committer rejecting S3A path in cluster for
temp dir. If you have s3guard on or a consistent store, it should actually
work. Ewan : enjoy!
This goes in sync with the latest test work I've done in my spark cloud
integration tests, where I've been verifying the ability to specify both the
committer and conflict mode from inside the spark workflow, by setting the
options on the writer. These get passed to the job configuration, and where the
s3a committer factory uses them to choose committer, the committers their
conflicts.
{code}
val writer = source.write
writer.partitionBy("year", "month")
writer.mode(SaveMode.Append)
writer.option(S3A_COMMITTER_NAME , committer)
writer.option(CONFLICT_MODE, conflict)
writer
.format(format)
.save(dest.toUri.toString)
{code}
The spark tests verify conflict behaviour, especially in partitioned commits.
The error message is to help people identify the problem, so they don't see a
task failure reported to the driver and interpret that as anything other than
the successful operation of the named committer.
> Merge S3A committers into trunk
> -------------------------------
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a
> github PR for review there & to keep it out the mailboxes of the watchers on
> the main JIRA
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]