[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795337#comment-16795337
]
Steve Loughran commented on HADOOP-13600:
-----------------------------------------
I'm reviewing this again. Nominally, the S3 transfer manager is paralellized
anyway.
But if a many GB copy is taking place there, all the small copy operations
which follow are being held up, even though many of them could be executed. So
yes, we do need something to do work in batches.
HADOOP-16189 looks at moving away from the transfer manager and doing it
ourselves. I'm not yet ready to take that on, but the 200 error of HADOOP-16188
means I have some doubts now about its longevity. I just don't want to rush
into that.
* We know rename will never go away, it's too ubiquitous
* we know that directory renames is a major bottleneck in things. Even "hadoop
fs -rm" commands, let along large hive jobs.
* if we can show tangible speedup, it's justifed
But: we need to retain consistency with s3Guard in the presence of failure.
Proposed: after every copy call completes, S3Guard is updated immediately with
the info about that dir existing. We'll update the delete calls after every
bulk delete
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Sahil Takiar
> Priority: Major
> Attachments: HADOOP-13600.001.patch
>
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]