[ https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575244#comment-15575244 ]
ASF GitHub Bot commented on HADOOP-13560: ----------------------------------------- Github user thodemoor commented on a diff in the pull request: https://github.com/apache/hadoop/pull/130#discussion_r83415072 --- Diff: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml --- @@ -1095,10 +1102,50 @@ <property> <name>fs.s3a.fast.upload</name> <value>false</value> - <description>Upload directly from memory instead of buffering to - disk first. Memory usage and parallelism can be controlled as up to - fs.s3a.multipart.size memory is consumed for each (part)upload actively - uploading (fs.s3a.threads.max) or queueing (fs.s3a.max.total.tasks)</description> + <description> + Use the incremental block-based fast upload mechanism with + the buffering mechanism set in fs.s3a.fast.upload.buffer. + </description> +</property> + +<property> + <name>fs.s3a.fast.upload.buffer</name> + <value>disk</value> + <description> + The buffering mechanism to use when using S3A fast upload + (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer. + This configuration option has no effect if fs.s3a.fast.upload is false. + + "disk" will use the directories listed in fs.s3a.buffer.dir as + the location(s) to save data prior to being uploaded. + + "array" uses arrays in the JVM heap + + "bytebuffer" uses off-heap memory within the JVM. + + Both "array" and "bytebuffer" will consume memory in a single stream up to the number + of blocks set by: + + fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks. + + If using either of these mechanisms, keep this value low + + The total number of threads performing work across all threads is set by + fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the number of queued + work items. --- End diff -- Completely agree. A bit further down I propose to add a single explanation in the javadoc and link to there in the various other locations > S3ABlockOutputStream to support huge (many GB) file writes > ---------------------------------------------------------- > > Key: HADOOP-13560 > URL: https://issues.apache.org/jira/browse/HADOOP-13560 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.9.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-13560-branch-2-001.patch, > HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, > HADOOP-13560-branch-2-004.patch > > > An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights > that metadata isn't copied on large copies. > 1. Add a test to do that large copy/rname and verify that the copy really > works > 2. Verify that metadata makes it over. > Verifying large file rename is important on its own, as it is needed for very > large commit operations for committers using rename -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org