[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889149#comment-16889149
 ] 

Hudson commented on HADOOP-13868:
---------------------------------

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16957 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16957/])
HADOOP-13868. [s3a] New default for S3A multi-part configuration (#1125) 
(github: rev 7f1b76ca3598acb0a0a843b2364c8963c70edf4d)
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md


> New defaults for S3A multi-part configuration
> ---------------------------------------------
>
>                 Key: HADOOP-13868
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13868
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.0, 3.0.0-alpha1
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Major
>         Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to