[ https://issues.apache.org/jira/browse/HADOOP-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654620#action_12654620 ]
Konstantin Shvachko commented on HADOOP-4702: --------------------------------------------- +1 Let us create a separate issue for reusing delBlockFromDisk() where it should/can be used. This patch goes to 0.18 so we want to minimize changes. > Failed block replication leaves an incomplete block in receiver's tmp data > directory > ------------------------------------------------------------------------------------ > > Key: HADOOP-4702 > URL: https://issues.apache.org/jira/browse/HADOOP-4702 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.2 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.18.3 > > Attachments: tmpBlockRemoval.patch, tmpBlockRemoval1.patch, > tmpBlockRemoval2.patch > > > When a failure occurs while replicating a block from a source DataNode to a > target DataNode, the target node keeps an incomplete on-disk copy of the > block in its temp data directory and an in-memory copy of the block in > ongoingCreates queue. This causes two problems: > 1. Since this block is not (should not) be finalized, NameNode is not aware > of the existence of this incomplete block. It may schedule replicating the > same block to this node again, which will fail with a message: "Block XX has > already been started (though not completed), and thus cannot be created." > 2. Restarting the datanode moves the blocks under the temp data directory to > be valid blocks, thus introduces corrupted blocks into HDFS. Sometimes those > corrupted blocks stay in the system undetected if it happens that the partial > block and its checksums match. > A failed block replication should clean up both the in-memory & on-disk > copies of the incomplete block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.