[
https://issues.apache.org/jira/browse/HDFS-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627227#comment-16627227
]
Steve Loughran commented on HDFS-8878:
--------------------------------------
HDFS-12090 will provide the API needed to do per-block uploads; a version of
distcp running at the MR layer can partition a source file by blocks and then
run across the cluster, again, concatting things together
> An HDFS built-in DistCp
> ------------------------
>
> Key: HDFS-8878
> URL: https://issues.apache.org/jira/browse/HDFS-8878
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Linxiao Jin
> Assignee: Linxiao Jin
> Priority: Major
>
> For now, we use DistCp to do directory copy, which works quite good. However,
> it would be better if there is an HDFS built-in, efficient, directory copy
> tool. It could be faster by cut off the redundant communication between HDFS,
> YARN and MapReduce. It could also release the resource DistCp consumed in job
> tracker and YARN and easier for debugging.
> We need more discussion on the new protocol between NN and DN from different
> clusters to achieve HDFS-level command sending and data transfer. One
> available hacky solution could be, the srcNN get the block distribution of
> the target file, ask each datanode to start a DFSClient and copy their local
> shortcircuited block as a file in dst cluster. After all the block-file in
> dst cluster is completed, use a DFSClient to concat them together to form the
> target destination file. There might be some optimized solution by implement
> a newly designed protocol to communicate over cluster rather than DFSClient
> and use methods from lower bottom layer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]