[
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693551#comment-15693551
]
Steve Loughran commented on HADOOP-13600:
-----------------------------------------
I've just stuck up my nov 10 patch as a PR too. Mine is WiP; I'd stopped as I'd
got to a point where it was too complex. Now we know of the thread pool
problems in the TransferManager we'll have to look at both.
What I have pulled out from mine, which I want to get in first, is having
{{innerRename()}} raise meaningful exceptions, rather than just return
true/false with no details whatsoever. That's part of HADOOP-13823.
Why? So that when I do an S3A-specific committer, it can stop having to deal
with the ambiguity of rename returning false, and instead fail with meaningful
messages. When I start that I'd make {{innerRename()}} package public, or add
some new @Private scoped void renameStrict() call *which would also take some
progressable callback*.
Looking at your code, I like how you rely on AWS callbacks to trigger deletes.
However, it's nice to be able to pool those to avoid throttling from too many
requests; that could be done by (optionally) building up the list and only
triggering a delete when a threshold was reached.
What your patch doesn't have —and I'd planned to but not done myself— was do
async listing. If we retain the current "process 5000, list and repeat"
strategy then the list creates a bottleneck, as may the waiting for all entries
in a single batch to complete (I'm not sure how much that situation arises, it
would if there was, say, a 4GB file and lots of other small files in the tree;
you could block for that 4GB file even while there is more to copy).
That we could maybe ignore. Other issues
# failures. If one copy fails, we'd want to not submit any more, even if
ongoing work is still completed
# queue saturation. Unless there's a separate rename thread pool (my strategy),
we should have some blocking queue of pending copies. Why? Stops all other IO
blocking just because one thread submitted 5000 copy operations.
> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
> Key: HADOOP-13600
> URL: https://issues.apache.org/jira/browse/HADOOP-13600
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> Currently a directory rename does a one-by-one copy, making the request
> O(files * data). If the copy operations were launched in parallel, the
> duration of the copy may be reducable to the duration of the longest copy.
> For a directory with many files, this will be significant
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]