Hi All, I am proposing S3 output copy Module. Primary functionality of this module is uploading files to S3 bucket using block-by-block approach.
Below is the JIRA created for this task: https://issues.apache.org/jira/browse/APEXMALHAR-2022 Design of this module is similar to HDFS copy module. So, I will extend HDFS copy module for S3. Design of this Module: ======================= 1) Writing blocks into HDFS. 2) Merge the blocks into a file . 3) Upload the above merged file into S3 Bucket using AmazonS3Client API's. Steps (1) & (2) are same as HDFS copy module. *Limitation:* Supports the size of file is up to 5 GB. Please refer the below link about limitations of Uploading objects into S3: http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html We can resolve the above limitation by using S3 Multipart feature. I will add multipart support in next iteration. Please share your thoughts on this. Regards, Chaitanya
