[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

liuguanghua (Jira) Sun, 21 Jul 2024 19:26:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867636#comment-17867636
 ]


liuguanghua commented on HDFS-2139:
-----------------------------------

[~hexiaoqiao]  , thanks for reply.

For 3 : fastcopy can use in a federation cluster,  and in a single cluster , 
and in two different cluster with no federation.  The difference is that 
fastcopy will use  hardlink in federation cluster or in a single cluster.  And 
fastcopy will use transfer in  two different cluster with no federation.   

 

Test Data:

blocksize 128M

1TB ECfiles + 1TB 3 replicated files

 
|distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via 
FastCopy(Transfer)|Distcp(original)|
|时间|5m6.687s|22m44.094s|38m17.024s|

[~zeekling] , fastcopy can improve data copy efficiency.

> Fast copy for HDFS.
> -------------------
>
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>            Assignee: Rituraj
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

Reply via email to