[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490020#comment-15490020
]
Steve Loughran commented on HADOOP-13600:
-----------------------------------------
Rather than naively issuing the copy calls in the order the list came back, we
should sort them in file size.
why? Assuming there is thread capacity, it means the largest files would all be
copied simultaneously; if some are smaller then after they complete the next
copies could start, while the biggest copy was still ongoing.
This would be faster than a list-ordered approach if the list contained a mix
of long and short blobs
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]