[
https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735641#comment-17735641
]
Steve Loughran commented on HADOOP-18776:
-----------------------------------------
I understand what you've done, like the patch, but am going to have to block
it. This is because, as you note, ,this has the same commit semantics as the v2
committer, and I consider the v2 committer *to be broken*
see:
https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_release_2021-05-17
it lacks the ability to recover from task failure; people using v2 don't
usually know/understand that, they just say "oh look, faster, lets use", switch
to it, and then sometimes it goes wrong. rarely, but it does happen. I don't
want to risk the same thing reoccuring on s3a.
(note, v1 commit needs atomic dir rename so is broken on gcs; so there v2
committer use was allowed as "they are both broken, this is faster". Now we
have the manifest committer in, gcs has an atomic commit too.
Given you want to speed up s3a committer; are there things which could be done
to the s3a committer to help it while still only manifesting in job commit?
better parallism? there was a patch to improve the threadpool that I think has
gone in, if not it should be reviewed and mergedd.
Finally, I'd love to know size of jobs where you hit problems, use etc. If
there's anything you can say publicly, that'd be great
> Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
> ----------------------------------------------------------------------
>
> Key: HADOOP-18776
> URL: https://issues.apache.org/jira/browse/HADOOP-18776
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
>
> The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter*
> which is an another type of S3 Magic committer but with a better performance
> by taking in few tradeoffs.
> The following are the differences in MagicCommitter vs OptimizedMagicCommitter
>
> ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*||
> |commitTask |1. Lists all {{.pending}} files in its attempt directory.
>
> 2. The contents are loaded into a list of single pending uploads.
>
> 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all
> {{.pending}} files in its attempt directory
>
> 2. The contents are loaded into a list of single pending uploads.
>
> 3. For each pending upload, commit operation is called (complete
> multiPartUpload)|
> |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory
>
> 2. Then every pending commit in the job will be committed.
>
> 3. "SUCCESS" marker is created (if config is enabled)
>
> 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if
> config is enabled)
>
> 2. "__magic" directory is cleaned up.|
>
> *Performance Benefits :-*
> # The primary performance boost due to distributed complete multiPartUpload
> call being made in the taskAttempts(Task containers/Executors) rather than a
> single job driver. In case of MagicCommitter it is O(files/threads).
> # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}"
> files and READ call to read them in the Job Driver.
>
> *TradeOffs :-*
> The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users
> migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no
> see behavioral change as such
> # During execution, intermediate data becomes visible after commitTask
> operation
> # On a failure, all output must be deleted and the job needs to be restarted.
>
> *Performance Benchmark :-*
> Cluster : c4.8x large (ec2-instance)
> Instance : 1 (primary) + 5 (core)
> Data Size : 3TB Partitioned(TPC-DS store_sales data)
> Engine : Apache Spark 3.3.1
> Query: The following query inserts around 3000+ files into the table
> directory (ran for 3 iterations)
> {code:java}
> insert into <table> select ss_quantity from store_sales; {code}
> ||Committer||Iteration 1||Iteration 2||Iteration 3||
> |Magic|126|127|122|
> |OptimizedMagic|50|51|58|
> So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to
> MagicCommitter.
>
> _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for
> all the cases where in user requires the guarantees of file not being visible
> in failure scenarios. Given the performance benefit, user can may choose to
> use this if they don't require any guarantees or have some mechanism to clean
> up the data before retrying.*_
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]