Hi All,

  I am proposing S3 output copy Module. Primary functionality of this
module is uploading files to S3 bucket using block-by-block approach.

  Below is the JIRA created for this task:
https://issues.apache.org/jira/browse/APEXMALHAR-2022

  Design of this module is similar to HDFS copy module. So, I will extend
HDFS copy module for S3.

Design of this Module:
=======================
1) Writing blocks into HDFS.
2) Merge the blocks into a file .
3) Upload the above merged file into S3 Bucket using AmazonS3Client API's.

Steps (1) & (2) are same as HDFS copy module.

*Limitation:* Supports the size of file is up to 5 GB. Please refer the
below link about limitations of Uploading objects into S3:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html

We can resolve the above limitation by using S3 Multipart feature. I will
add multipart support in next iteration.

 Please share your thoughts on this.

Regards,
Chaitanya

Reply via email to