ASF GitHub Bot commented on HADOOP-13560:

Github user thodemoor commented on a diff in the pull request:

    --- Diff: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ---
    @@ -1095,10 +1102,50 @@
    -  <description>Upload directly from memory instead of buffering to
    -    disk first. Memory usage and parallelism can be controlled as up to
    -    fs.s3a.multipart.size memory is consumed for each (part)upload actively
    -    uploading (fs.s3a.threads.max) or queueing 
    +  <description>
    +    Use the incremental block-based fast upload mechanism with
    +    the buffering mechanism set in fs.s3a.fast.upload.buffer.
    +  </description>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +  <description>
    +    The buffering mechanism to use when using S3A fast upload
    +    (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
    +    This configuration option has no effect if fs.s3a.fast.upload is false.
    +    "disk" will use the directories listed in fs.s3a.buffer.dir as
    +    the location(s) to save data prior to being uploaded.
    +    "array" uses arrays in the JVM heap
    +    "bytebuffer" uses off-heap memory within the JVM.
    +    Both "array" and "bytebuffer" will consume memory in a single stream 
up to the number
    +    of blocks set by:
    +        fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
    +    If using either of these mechanisms, keep this value low
    +    The total number of threads performing work across all threads is set 
    +    fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the 
number of queued
    +    work items.
    --- End diff --
    The total max block (memory/disk) consumption, across all streams, is 
bounded by`fs.s3a.multipart.size * ( fs.s3a.fast.upload.active.blocks + 
fs.s3a.max.total.tasks +  1)` bytes for an instance of S3AFileSystem.

> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to