+1 regarding the s3 upload functionality. However, I think we should just focus on multipart upload directly as it comes with various advantages like higher throughput, faster recovery, not needing to wait for entire file being created before uploading each part. See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
Also, seems like we can do multipart upload if the file size is more than 5MB. They do recommend using multipart if the file size is more than 100MB. I am not sure if there is a hard lower limit though. See: http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html This way, it seems like we don't to have to wait until a file is completely written to hdfs before performing the upload operation. Regards, Ashwin. On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <[email protected]> wrote: > +1 , we need this functionality. > > Is it going to be a single operator or multiple operators? If multiple > operators, then can you explain what functionality each operator will > provide? > > > Regards, > -Tushar. > > > On Wed, Mar 23, 2016 at 5:01 PM, Yogi Devendra <[email protected]> > wrote: > > > Writing to S3 is a common use-case for applications. > > This module will be definitely helpful. > > > > +1 for adding this module. > > > > > > ~ Yogi > > > > On 22 March 2016 at 13:52, Chaitanya Chebolu <[email protected]> > > wrote: > > > > > Hi All, > > > > > > I am proposing S3 output copy Module. Primary functionality of this > > > module is uploading files to S3 bucket using block-by-block approach. > > > > > > Below is the JIRA created for this task: > > > https://issues.apache.org/jira/browse/APEXMALHAR-2022 > > > > > > Design of this module is similar to HDFS copy module. So, I will > extend > > > HDFS copy module for S3. > > > > > > Design of this Module: > > > ======================= > > > 1) Writing blocks into HDFS. > > > 2) Merge the blocks into a file . > > > 3) Upload the above merged file into S3 Bucket using AmazonS3Client > > API's. > > > > > > Steps (1) & (2) are same as HDFS copy module. > > > > > > *Limitation:* Supports the size of file is up to 5 GB. Please refer the > > > below link about limitations of Uploading objects into S3: > > > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html > > > > > > We can resolve the above limitation by using S3 Multipart feature. I > will > > > add multipart support in next iteration. > > > > > > Please share your thoughts on this. > > > > > > Regards, > > > Chaitanya > > > > > > -- Regards, Ashwin.
