[
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888392#comment-16888392
]
Hadoop QA commented on HADOOP-13868:
------------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color}
| {color:red} HADOOP-13868 does not apply to trunk. Rebase required? Wrong
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13868 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12842566/HADOOP-13868.002.patch
|
| Console output |
https://builds.apache.org/job/PreCommit-HADOOP-Build/16388/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> New defaults for S3A multi-part configuration
> ---------------------------------------------
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.7.0, 3.0.0-alpha1
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch,
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the
> threshold for multi-part uploads (and the block size is 20x bigger), so I
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part
> copies start being faster is around 512MB. It wasn't as significant, but
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and
> corresponding properties for the block size). But then there's the question
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation?
> Leave it as a short-hand for configuring both (that's overridden by the more
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be
> where multipart uploads came into their own, and 512 MB was where multipart
> copies started outperforming the alternative. Would be interested to hear
> what other people have seen.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]