[ 
https://issues.apache.org/jira/browse/HDFS-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163190#comment-17163190
 ] 

Yang Yun commented on HDFS-15484:
---------------------------------

Thanks [[email protected]] for your detailed explanation.

Got it, chenges of FS APIs should be more cautious, and also need relevant FS 
spec updated.
I leave this jira for disscution. I can continue it if your think it's 
worthiness.

The motivation for this batch operation is for the special user. for example, 
in the end of spark job, it will rename or delete many files, IPC peak will 
impact the perfermance of  Namenode. The batch operation is differently 
original operation, but it is really can improve the perfermance in our 
distribute filesystem.

Yes, we need define the *batch*.  Any operations may be in batch,  need to 
define the ordering, the exception and the return value. one simple way is 
client send the operations in fixed order, the server side executes they one by 
one. every single operation is same as before. if any one breaks, server side 
will immediately return the failed index and failed reason to the client. for 
example,

rename(/a:/b:/c, /b:/c:/a) 
step 1:  (/a > /b) success
step 2 : (/b > /c) success
step 3:  (/c > /a) success
return 3, Actually, do nothing.

rename(/a:/a/subdir, /b/c:/b) 
step 1: ( /a > /b/c ) success
step 2: (/a/subdir, /b) failed for "/a/subdir" is not existing. 
will return the faile number 1 and the failed reason ("/a/subdir" is not 
existent)

rename(/a:/nonexistent, /b:/c) 
step 1: (/a > /b) success
step 2: (/nonexistent > /c) failed
will return  number 1 and failed reason (/nonexistent is not existent)

rename(/a:/b, /c:/c) 
step 1: (/a > /c) success
step 2: (/b > /c) failed
will return 1 and half failed reason (/c is existent)

Anyway, the single operation is same as before. only run one by one on the 
server side. 
the server will return a exception if it's not support batch. other filesystem 
can throw exception for the batch.
We can combine paths in other ways, for example, json format.

I also think async opertaion is very valuable, especially for batch operations. 
If something needs me to do, please let me know.

> Add option in enum Rename to suport batch rename
> ------------------------------------------------
>
>                 Key: HDFS-15484
>                 URL: https://issues.apache.org/jira/browse/HDFS-15484
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient, namenode, performance
>            Reporter: Yang Yun
>            Assignee: Yang Yun
>            Priority: Minor
>         Attachments: HDFS-15484.001.patch
>
>
> Sometime we need rename many files after a task,  add a new option in enum 
> Rename to support batch rename, which only need one RPC and one lock. For 
> example,
> rename(new Path("/dir1/f1::/dir2/f2"), new Path("/dir3/f1::dir4/f4"), 
> Rename.BATCH)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to