[ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575317#comment-15575317
 ] 

ASF GitHub Bot commented on HADOOP-13560:
-----------------------------------------

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/hadoop/pull/130#discussion_r83419925
  
    --- Diff: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ---
    @@ -881,40 +881,362 @@ Seoul
     If the wrong endpoint is used, the request may fail. This may be reported 
as a 301/redirect error,
     or as a 400 Bad Request.
     
    -### S3AFastOutputStream
    - **Warning: NEW in hadoop 2.7. UNSTABLE, EXPERIMENTAL: use at own risk**
     
    -    <property>
    -      <name>fs.s3a.fast.upload</name>
    -      <value>false</value>
    -      <description>Upload directly from memory instead of buffering to
    -      disk first. Memory usage and parallelism can be controlled as up to
    -      fs.s3a.multipart.size memory is consumed for each (part)upload 
actively
    -      uploading (fs.s3a.threads.max) or queueing 
(fs.s3a.max.total.tasks)</description>
    -    </property>
     
    -    <property>
    -      <name>fs.s3a.fast.buffer.size</name>
    -      <value>1048576</value>
    -      <description>Size (in bytes) of initial memory buffer allocated for 
an
    -      upload. No effect if fs.s3a.fast.upload is false.</description>
    -    </property>
    +### <a name="s3a_fast_upload"></a>Stabilizing: S3A Fast Upload
    +
    +
    +**New in Hadoop 2.7; significantly enhanced in Hadoop 2.9**
    +
    +
    +Because of the nature of the S3 object store, data written to an S3A 
`OutputStream` 
    +is not written incrementally —instead, by default, it is buffered to disk
    +until the stream is closed in its `close()` method. 
    +
    +This can make output slow:
    +
    +* The execution time for `OutputStream.close()` is proportional to the 
amount of data
    +buffered and inversely proportional to the bandwidth. That is 
`O(data/bandwidth)`.
    +* The bandwidth is that available from the host to S3: other work in the 
same
    +process, server or network at the time of upload may increase the upload 
time,
    +hence the duration of the `close()` call.
    +* If a process uploading data fails before `OutputStream.close()` is 
called,
    +all data is lost.
    +* The disks hosting temporary directories defined in `fs.s3a.buffer.dir` 
must
    +have the capacity to store the entire buffered file.
    +
    +Put succinctly: the further the process is from the S3 endpoint, or the 
smaller
    +the EC-hosted VM is, the longer it will take work to complete.
    +
    +This can create problems in application code:
    +
    +* Code often assumes that the `close()` call is fast;
    + the delays can create bottlenecks in operations.
    +* Very slow uploads sometimes cause applications to time out. (generally,
    +threads blocking during the upload stop reporting progress, so trigger 
timeouts)
    +* Streaming very large amounts of data may consume all disk space before 
the upload begins.
    +
    +
    +Work to addess this began in Hadoop 2.7 with the `S3AFastOutputStream`
    +[HADOOP-11183](https://issues.apache.org/jira/browse/HADOOP-11183), and
    +has continued with ` S3ABlockOutputStream`
    +[HADOOP-13560](https://issues.apache.org/jira/browse/HADOOP-13560).
    +
    +
    +This adds an alternative output stream, "S3a Fast Upload" which:
    +
    +1.  Always uploads large files as blocks with the size set by
    +    `fs.s3a.multipart.size`. That is: the threshold at which multipart 
uploads
    +    begin and the size of each upload are identical.
    +1.  Buffers blocks to disk (default) or in on-heap or off-heap memory.
    +1.  Uploads blocks in parallel in background threads.
    +1.  Begins uploading blocks as soon as the buffered data exceeds this 
partition
    +    size.
    +1.  When buffering data to disk, uses the directory/directories listed in
    +    `fs.s3a.buffer.dir`. The size of data which can be buffered is limited
    +    to the available disk space.
    +1.  Generates output statistics as metrics on the filesystem, including
    +    statistics of active and pending block uploads.
    +1.  Has the time to `close()` set by the amount of remaning data to 
upload, rather
    +    than the total size of the file.
    +
    +With incremental writes of blocks, "S3A fast upload" offers an upload
    +time at least as fast as the "classic" mechanism, with significant benefits
    +on long-lived output streams, and when very large amounts of data are 
generated.
    +The in memory buffering mechanims may also  offer speedup when running 
adjacent to
    +S3 endpoints, as disks are not used for intermediate data storage.
    +
    +
    +```xml
    +<property>
    +  <name>fs.s3a.fast.upload</name>
    +  <value>true</value>
    +  <description>
    +    Use the incremental block upload mechanism with
    +    the buffering mechanism set in fs.s3a.fast.upload.buffer.
    +    The number of threads performing uploads in the filesystem is defined
    +    by fs.s3a.threads.max; the queue of waiting uploads limited by
    +    fs.s3a.max.total.tasks.
    +    The size of each buffer is set by fs.s3a.multipart.size.
    +  </description>
    +</property>
    +
    +<property>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +  <description>
    +    The buffering mechanism to use when using S3A fast upload
    +    (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
    +    This configuration option has no effect if fs.s3a.fast.upload is false.
    +
    +    "disk" will use the directories listed in fs.s3a.buffer.dir as
    +    the location(s) to save data prior to being uploaded.
    +
    +    "array" uses arrays in the JVM heap
    +
    +    "bytebuffer" uses off-heap memory within the JVM.
    +
    +    Both "array" and "bytebuffer" will consume memory in a single stream 
up to the number
    +    of blocks set by:
    +
    +        fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
    +
    +    If using either of these mechanisms, keep this value low
    +
    +    The total number of threads performing work across all threads is set 
by
    +    fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the 
number of queued
    +    work items.
    +  </description>
    +</property>
    +  
    +<property>
    +  <name>fs.s3a.multipart.size</name>
    +  <value>104857600</value>
    +  <description>
    +  How big (in bytes) to split upload or copy operations up into.
    +  </description>
    +</property>
    +
    +<property>
    +  <name>fs.s3a.fast.upload.active.blocks</name>
    +  <value>8</value>
    +  <description>
    +    Maximum Number of blocks a single output stream can have
    +    active (uploading, or queued to the central FileSystem
    +    instance's pool of queued operations.
    +
    +    This stops a single stream overloading the shared thread pool.
    +  </description>
    +</property>
    +```
    +
    +**Notes**
    +
    +* If the amount of data written to a stream is below that set in 
`fs.s3a.multipart.size`,
    +the upload is performed in the `OutputStream.close()` operation —as with
    +the original output stream.
    +
    +* The published Hadoop metrics monitor include live queue length and
    +upload operation counts, so identifying when there is a backlog of work/
    +a mismatch between data generation rates and network bandwidth. Per-stream
    +statistics can also be logged by calling `toString()` on the current 
stream.
    +
    +* Incremental writes are not visible; the object can only be listed
    +or read when the multipart operation completes in the `close()` call, which
    +will block until the upload is completed.
    +
    +
    +#### <a name="s3a_fast_upload_disk"></a>Fast Upload with Disk Buffers 
`fs.s3a.fast.upload.buffer=disk`
    +
    +When `fs.s3a.fast.upload.buffer` is set to `disk`, all data is buffered
    +to local hard disks prior to upload. This minimizes the amount of memory
    +consumed, and so eliminates heap size as the limiting factor in queued 
uploads
    +—exactly as the original "direct to disk" buffering used when
    +`fs.s3a.fast.upload=false`.
    +
    +
    +```xml
    +<property>
    +  <name>fs.s3a.fast.upload</name>
    +  <value>true</value>
    +</property>
    +  
    +<property>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +</property>
    +
    +```
    +
    +
    +#### <a name="s3a_fast_upload_bytebuffer"></a>Fast Upload with 
ByteBuffers: `fs.s3a.fast.upload.buffer=bytebuffer`
    +
    +When `fs.s3a.fast.upload.buffer` is set to `bytebuffer`, all data is 
buffered
    +in "Direct" ByteBuffers prior to upload. This *may* be faster than 
buffering to disk,
    +and, if disk space is small (for example, tiny EC2 VMs), there may not
    +be much disk space to buffer with.
    +
    +The ByteBuffers are created in the memory of the JVM, but not in the Java 
Heap itself.
    +The amount of data which can be buffered is
    +limited by the Java runtime, the operating system, and, for YARN 
applications,
    +the amount of memory requested for each container.
    +
    +The slower the write bandwidth to S3, the greater the risk of running out
    +of memory.
    +
    +
    +```xml
    +<property>
    +  <name>fs.s3a.fast.upload</name>
    +  <value>true</value>
    +</property>
    +  
    +<property>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>bytebuffer</value>
    +</property>
    +```
    +
    +#### <a name="s3a_fast_upload_array"></a>Fast Upload with Arrays: 
`fs.s3a.fast.upload.buffer=array`
    +
    +When `fs.s3a.fast.upload.buffer` is set to `array`, all data is buffered
    +in byte arrays in the JVM's heap prior to upload.
    +This *may* be faster than buffering to disk.
    +
    +This `array` option is similar to the in-memory-only stream offered in
    +Hadoop 2.7 with `fs.s3a.fast.upload=true`
    +
    +The amount of data which can be buffered is limited by the available
    +size of the JVM heap heap. The slower the write bandwidth to S3, the 
greater
    +the risk of heap overflows.
    --- End diff --
    
    adding link to the s3a_fast_upload_thread_tuning section


> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to