[
https://issues.apache.org/jira/browse/HADOOP-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161499#comment-17161499
]
Steve Loughran commented on HADOOP-17139:
-----------------------------------------
yeah, it was broken. Worked well for a source file, but missed the *small*
detail that you needed to handle directories too.
Cut it out in an emergency & never got round to doing it as I didn't think it
was that critical a path for work.
the commented out impl uses the AWS transfer manager to do the upload. It
splits the upload up into parts and uploads > 1 part in parallel, does
multipart as needed. But we should have a test to verify that (if we can do a
test which doesn't involve a 5GB file)
> Re-enable optimized copyFromLocal implementation in S3AFileSystem
> -----------------------------------------------------------------
>
> Key: HADOOP-17139
> URL: https://issues.apache.org/jira/browse/HADOOP-17139
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Sahil Takiar
> Priority: Major
>
> It looks like HADOOP-15932 disabled the optimized copyFromLocal
> implementation in S3A for correctness reasons. innerCopyFromLocalFile should
> be fixed and re-enabled. The current implementation uses
> FileSystem.copyFromLocal which will open an input stream from the local fs
> and an output stream to the destination fs, and then call IOUtils.copyBytes.
> With default configs, this will cause S3A to read the file into memory, write
> it back to a file on the local fs, and then when the file is closed, upload
> it to S3.
> The optimized version of copyFromLocal in innerCopyFromLocalFile, directly
> creates a PutObjectRequest request with the local file as the input.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]