[
https://issues.apache.org/jira/browse/HDFS-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163190#comment-17163190
]
Yang Yun commented on HDFS-15484:
---------------------------------
Thanks [[email protected]] for your detailed explanation.
Got it, chenges of FS APIs should be more cautious, and also need relevant FS
spec updated.
I leave this jira for disscution. I can continue it if your think it's
worthiness.
The motivation for this batch operation is for the special user. for example,
in the end of spark job, it will rename or delete many files, IPC peak will
impact the perfermance of Namenode. The batch operation is differently
original operation, but it is really can improve the perfermance in our
distribute filesystem.
Yes, we need define the *batch*. Any operations may be in batch, need to
define the ordering, the exception and the return value. one simple way is
client send the operations in fixed order, the server side executes they one by
one. every single operation is same as before. if any one breaks, server side
will immediately return the failed index and failed reason to the client. for
example,
rename(/a:/b:/c, /b:/c:/a)
step 1: (/a > /b) success
step 2 : (/b > /c) success
step 3: (/c > /a) success
return 3, Actually, do nothing.
rename(/a:/a/subdir, /b/c:/b)
step 1: ( /a > /b/c ) success
step 2: (/a/subdir, /b) failed for "/a/subdir" is not existing.
will return the faile number 1 and the failed reason ("/a/subdir" is not
existent)
rename(/a:/nonexistent, /b:/c)
step 1: (/a > /b) success
step 2: (/nonexistent > /c) failed
will return number 1 and failed reason (/nonexistent is not existent)
rename(/a:/b, /c:/c)
step 1: (/a > /c) success
step 2: (/b > /c) failed
will return 1 and half failed reason (/c is existent)
Anyway, the single operation is same as before. only run one by one on the
server side.
the server will return a exception if it's not support batch. other filesystem
can throw exception for the batch.
We can combine paths in other ways, for example, json format.
I also think async opertaion is very valuable, especially for batch operations.
If something needs me to do, please let me know.
> Add option in enum Rename to suport batch rename
> ------------------------------------------------
>
> Key: HDFS-15484
> URL: https://issues.apache.org/jira/browse/HDFS-15484
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: dfsclient, namenode, performance
> Reporter: Yang Yun
> Assignee: Yang Yun
> Priority: Minor
> Attachments: HDFS-15484.001.patch
>
>
> Sometime we need rename many files after a task, add a new option in enum
> Rename to support batch rename, which only need one RPC and one lock. For
> example,
> rename(new Path("/dir1/f1::/dir2/f2"), new Path("/dir3/f1::dir4/f4"),
> Rename.BATCH)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]