[ 
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795337#comment-16795337
 ] 

Steve Loughran commented on HADOOP-13600:
-----------------------------------------

I'm reviewing this again. Nominally, the S3 transfer manager is paralellized 
anyway. 

But if a many GB copy is taking place there, all the small copy operations 
which follow are being held up, even though many of them could be executed. So 
yes, we do need something to do work in batches.

HADOOP-16189 looks at moving away from the transfer manager and doing it 
ourselves. I'm not yet ready to take that on, but the 200 error of HADOOP-16188 
means I have some doubts now about its longevity. I just don't want to rush 
into that.

* We know rename will never go away, it's too ubiquitous
* we know that directory renames is a major bottleneck in things. Even "hadoop 
fs -rm" commands, let along large hive jobs.
* if we can show tangible speedup, it's justifed

But: we need to retain consistency with s3Guard in the presence of failure. 
Proposed: after every copy call completes, S3Guard is updated immediately with 
the info about that dir existing. We'll update the delete calls after every 
bulk delete

> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
>                 Key: HADOOP-13600
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13600
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HADOOP-13600.001.patch
>
>
> Currently a directory rename does a one-by-one copy, making the request 
> O(files * data). If the copy operations were launched in parallel, the 
> duration of the copy may be reducable to the duration of the longest copy. 
> For a directory with many files, this will be significant



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to