[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.

fanshilun (Jira) Wed, 24 Aug 2022 02:45:12 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584145#comment-17584145
 ]


fanshilun edited comment on HDFS-2139 at 8/24/22 9:44 AM:
----------------------------------------------------------

[~xuzq_zander]

Very happy that this feature can be restarted, but there are the following 
question:
  1. Is there enough performance test data for HDFS-15294? What is the expected 
performance improvement of HDFS-2139 after implementation? 
  2. It seems that the planning of tasks in the design document is not very 
clear. Can you explain the specific transformation content of each task in 
detail?

 

Task1: Add a new method LocalBlockCopyViaHardLink to Datanode

This doesn't seem to be described in the documentation


was (Author: slfan1989):
[~xuzq_zander]

Very happy that this feature can be restarted, but there are the following 
question:
  1. Is there enough performance test data for HDFS-15294? What is the expected 
performance improvement of HDFS-2139 after implementation? 
  2. It seems that the planning of tasks in the design document is not very 
clear. Can you explain the specific transformation content of each task in 
detail?

 

> Fast copy for HDFS.
> -------------------
>
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>            Assignee: ZanderXu
>            Priority: Major
>         Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.

Reply via email to