[ 
https://issues.apache.org/jira/browse/HADOOP-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913374#comment-13913374
 ] 

Aaron T. Myers commented on HADOOP-9454:
----------------------------------------

Akira / Jodran - thanks a lot for working on this. In the abstract I'm happy to 
check this change into Hadoop, but I don't consider myself especially well 
qualified to review this change since I'm not super familiar with S3/jets3t. 
I've asked [~amansk] to take a look at it, since he's been involved with some 
of the more recent work around upgrading jets3t, etc. and if it looks good to 
him, I'll check it in, in which case it'll likely first show up in Hadoop 2.4.0.

Jordan -  would you have any interest in contributing your S3A implementation 
to Hadoop? Is it Apache licensed? If so, we should file a new JIRA to get that 
checked in alongside the existing S3 and S3N FS implementations.

> Support multipart uploads for s3native
> --------------------------------------
>
>                 Key: HADOOP-9454
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9454
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Jordan Mendelson
>            Assignee: Akira AJISAKA
>         Attachments: HADOOP-9454-10.patch, HADOOP-9454-11.patch, 
> HADOOP-9454-12.patch
>
>
> The s3native filesystem is limited to 5 GB file uploads to S3, however the 
> newest version of jets3t supports multipart uploads to allow storing multi-TB 
> files. While the s3 filesystem lets you bypass this restriction by uploading 
> blocks, it is necessary for us to output our data into Amazon's 
> publicdatasets bucket which is shared with others.
> Amazon has added a similar feature to their distribution of hadoop as has 
> MapR.
> Please note that while this supports large copies, it does not yet support 
> parallel copies because jets3t doesn't expose an API yet that allows it 
> without hadoop controlling the threads unlike with upload.
> By default, this patch does not enable multipart uploads. To enable them and 
> parallel uploads:
> add the following keys to your hadoop config:
> <property>
>   <name>fs.s3n.multipart.uploads.enabled</name>
>   <value>true</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.uploads.block.size</name>
>   <value>67108864</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.copy.block.size</name>
>   <value>5368709120</value>
> </property>
> create a /etc/hadoop/conf/jets3t.properties file with or similar to:
> storage-service.internal-error-retry-max=5
> storage-service.disable-live-md5=false
> threaded-service.max-thread-count=20
> threaded-service.admin-max-thread-count=20
> s3service.max-thread-count=20
> s3service.admin-max-thread-count=20



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to