[ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167051#comment-14167051
 ] 

Thomas Demoor commented on HADOOP-11183:
----------------------------------------

h4. Overview
Indeed, I do not immediately see any data integrity regressions.  The current 
file-based implementation does not leverage the durability provided by the 
disks. 

The key differences between object stores and HDFS one has to keep in mind here 
are that an object is immutable (thus no appends) and that an object only 
"starts existing" after close() returns (no block per block transfer and 
reading previous blocks while writing the last block cfr. HDFS). 

One can mitigate the need for buffer space through MultipartUpload and 
realizing that we can start uploading before the file is closed, more 
precisely, once the buffer reaches the partSizeThreshold one can initiate a 
MultipartUpload and start uploading parts. This will (probably) require use of 
the [low-level AWS API | 
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpListPartsJavaAPI.html] instead 
of the currently used high-level API (TransferManager) and we will thus need to 
do some bookkeeping (part number, etc.) ourselves. 

h4. A *rough* algorithm
Write commands append to the in memory "buffer" (ByteArrayOutputStream?) 
If close is called while buffer.size < partSizeThreshold  do a regular upload 
(using TransferManager?) 
Else a write causes  buffer.size >= partSizeThreshold:
* initiate multipart upload: upload the current buffer contents per part 
(partSize) and "resize" buffer to length=partSize
* subsequent writes fill up the buffer, when partSize is exceeded: transfer a 
part
* close() flushes the remaining buffer, waits for all parts to be uploaded and 
completes the MultipartUpload

DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be 
asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. 
TransferManager. For the following I assume the former.

h4. Memory usage
Maximum amount of used memory = (number of open files) x  
max(partSizeThreshold, partSize)
The minimum partSize(Threshold) is 5MB but evidently bigger parts increase 
throughput. How many files could one  expect to be open at the same time for 
different sensible (f.i. not HBase) use cases?  We should provide documentation 
helping users choose values for partSizeThreshold and partSize according to 
their use case and available memory. Furthermore, we probably want to keep both 
implementations around and introduce a config setting to choose between them.

I will submit a patch that lays out these ideas so we have a starting point to 
kick off the discussion.


> Memory-based S3AOutputstream
> ----------------------------
>
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA 
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users 
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
> an S3-compatible object store (FYI: my contributions are made in name of 
> Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to