Writing to S3 is a common use-case for applications. This module will be definitely helpful.
+1 for adding this module. ~ Yogi On 22 March 2016 at 13:52, Chaitanya Chebolu <[email protected]> wrote: > Hi All, > > I am proposing S3 output copy Module. Primary functionality of this > module is uploading files to S3 bucket using block-by-block approach. > > Below is the JIRA created for this task: > https://issues.apache.org/jira/browse/APEXMALHAR-2022 > > Design of this module is similar to HDFS copy module. So, I will extend > HDFS copy module for S3. > > Design of this Module: > ======================= > 1) Writing blocks into HDFS. > 2) Merge the blocks into a file . > 3) Upload the above merged file into S3 Bucket using AmazonS3Client API's. > > Steps (1) & (2) are same as HDFS copy module. > > *Limitation:* Supports the size of file is up to 5 GB. Please refer the > below link about limitations of Uploading objects into S3: > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html > > We can resolve the above limitation by using S3 Multipart feature. I will > add multipart support in next iteration. > > Please share your thoughts on this. > > Regards, > Chaitanya >
