[
https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186308#comment-15186308
]
Tsz Wo Nicholas Sze commented on HDFS-9763:
-------------------------------------------
Alternative to the merge API, we may support Asynchronous HDFS Access; see
HDFS-9924.
> Add merge api
> -------------
>
> Key: HDFS-9763
> URL: https://issues.apache.org/jira/browse/HDFS-9763
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
> Reporter: Ashutosh Chauhan
> Assignee: Xiaobing Zhou
> Attachments: HDFS_Merge_API_Proposal.pdf
>
>
> It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS.
> Semantics will be to move all files under dir1 to dir2 and doing a rename of
> files in case of collisions.
> In absence of this api, Hive[1] has to check for collision for each file and
> then come up unique name and try again and so on. This is inefficient in
> multiple ways:
> 1) It generates huge number of calls on NN (atleast 2*number of source files
> in dir1)
> 2) It suffers from TOCTOU[2] bug for client picked up name in case of
> collision.
> 3) Whole operation is not atomic.
> A merge api outlined as above will be immensely useful for Hive and
> potentially to other HDFS users.
> [1]
> https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576
> [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)