[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

ASF GitHub Bot (JIRA) Fri, 14 Oct 2016 02:23:32 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574763#comment-15574763
 ]


ASF GitHub Bot commented on HADOOP-13560:
-----------------------------------------

Github user thodemoor commented on a diff in the pull request:

    https://github.com/apache/hadoop/pull/130#discussion_r83377114
  
    --- Diff: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ---
    @@ -1095,10 +1102,50 @@
     <property>
       <name>fs.s3a.fast.upload</name>
       <value>false</value>
    -  <description>Upload directly from memory instead of buffering to
    -    disk first. Memory usage and parallelism can be controlled as up to
    -    fs.s3a.multipart.size memory is consumed for each (part)upload actively
    -    uploading (fs.s3a.threads.max) or queueing 
(fs.s3a.max.total.tasks)</description>
    +  <description>
    +    Use the incremental block-based fast upload mechanism with
    +    the buffering mechanism set in fs.s3a.fast.upload.buffer.
    +  </description>
    +</property>
    +
    +<property>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +  <description>
    +    The buffering mechanism to use when using S3A fast upload
    +    (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
    +    This configuration option has no effect if fs.s3a.fast.upload is false.
    +
    +    "disk" will use the directories listed in fs.s3a.buffer.dir as
    +    the location(s) to save data prior to being uploaded.
    +
    +    "array" uses arrays in the JVM heap
    +
    +    "bytebuffer" uses off-heap memory within the JVM.
    +
    +    Both "array" and "bytebuffer" will consume memory in a single stream 
up to the number
    +    of blocks set by:
    +
    +        fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
    +
    +    If using either of these mechanisms, keep this value low
    +
    +    The total number of threads performing work across all threads is set 
by
    +    fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the 
number of queued
    +    work items.
    --- End diff --
    
    The total max block (memory/disk) consumption, across all streams, is 
bounded by`fs.s3a.multipart.size * ( fs.s3a.fast.upload.active.blocks + 
fs.s3a.max.total.tasks +  1)` bytes for an instance of S3AFileSystem.


> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Reply via email to