[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385641#comment-16385641
 ] 

wujinhu edited comment on HADOOP-15262 at 3/8/18 2:36 AM:
----------------------------------------------------------

Thanks [~Sammi] for your review comments. I have fixed from 1 to 4.

For 5, as we all know, copy operation will be inexpensive as oss will support 
shallow copy soon. User can configure a higher number threads to copy files, so 
it is a little hard to define the upper limit of the waiting list 
size(Different from pre-read configuration, because read operations are 
expensive). However, though the queue is defined as unbounded queue, but we 
have used SemaphoredDelegatingExecutor to limit the concurrency of one 
directory. 

For 6, since we read only one field of AliyunOSSCopyFileContext class, there is 
no need to call lock()(we may copy one more file when whole rename operation 
failed, but it's OK). Reduce the call of lock() can also improve our 
performance.


was (Author: wujinhu):
Thanks [~Sammi] for your review comments. I have fixed from 1 to 4.

For 5, as we all know, copy operation will be inexpensive as oss will support 
shallow copy soon. User can configure a higher number threads to copy files, so 
it is a little hard to define the upper limit of the waiting list 
size(Different from pre-read configuration, because read operations are 
expensive). However, though the queue is defined as unbounded queue, but we 
have used SemaphoredDelegatingExecutor to limit the concurrency of one 
directory. 

For 6, since we read only one field of AliyunOSSCopyFileContext class, there is 
no need to call lock(). Reduce the call of lock() can also improve our 
performance.

> AliyunOSS: rename() to move files in a directory in parallel
> ------------------------------------------------------------
>
>                 Key: HADOOP-15262
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15262
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>            Priority: Major
>             Fix For: 3.1.0, 2.9.1, 3.0.1
>
>         Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to