[
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127616#comment-15127616
]
Liu Junhong commented on HDFS-2139:
-----------------------------------
How the fastcopy work with 2 different NNs (assume src file is in NN1, dst is
NN2):
1: send create file to NN2
2: getblocklocation for the src file
3: send addblock to NN2 using the favornodes
4: send copyblock to the datanode whitch is the result of step 3
So, if the src file is deleted before step 1, step 2 will be fail, and the dst
file will be delete by leasemanager.
If the src file is deleted before step4, some of the dst blocks' final state
will not be finalized, it will be delete by FastCopy at line 745.
But there is a worst situation: a runtime exception occurs, it will lead block
missing.
> Fast copy for HDFS.
> -------------------
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Pritam Damania
> Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch,
> HDFS-2139.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)