Failed block replication leaves an incomplete block in receiver's tmp data 
directory
------------------------------------------------------------------------------------

                 Key: HADOOP-4702
                 URL: https://issues.apache.org/jira/browse/HADOOP-4702
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.17.2
            Reporter: Hairong Kuang
             Fix For: 0.20.0


When a failure occurs while replicating a block from a source DataNode to a 
target DataNode, the target node keeps an incomplete on-disk copy of the block 
in its temp data directory and an in-memory copy of the block in ongoingCreates 
queue. This causes two problems:
1. Since this block is not (should not) be finalized, NameNode is not aware of 
the existence of this incomplete block. It may schedule replicating the same 
block to this node again, which will fail with a message: "Block XX has already 
been started (though not completed), and thus cannot be created."
2. Restarting the datanode moves the blocks under the temp data directory to be 
valid blocks, thus introduces corrupted blocks into HDFS. Sometimes those 
corrupted blocks stay in the system undetected if it happens that the partial 
block and its checksums match.

A failed block replication should clean up both the in-memory & on-disk copies 
of the incomplete block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to