[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Hairong Kuang (JIRA) Tue, 27 Jan 2009 17:17:22 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667901#action_12667901
 ]


Hairong Kuang commented on HADOOP-5133:
---------------------------------------

Below was pat of the log that illustrated the bug.
1. a block was allocated for a file
    INFO  hdfs.StateChange (FSNamesystem.java:allocateBlock(1398)) - BLOCK* 
NameSystem.allocateBlock: /xx/file7. blk_2248817250507458558_1010
2. a write pipeline error occurred and a lease recovery added two datanodes to 
the block's blockMap (this is a bug reported at HADOOP-5134) and set this 
block's length to be 6
    INFO  namenode.FSNamesystem 
(FSNamesystem.java:commitBlockSynchronization(1835)) - 
commitBlockSynchronization(lastblock=blk_2248817250507458558_1010, 
newgenerationstamp=1011, newlength=6, newtargets=[127.0.0.1:51021, 
127.0.0.1:51024])
3. when the block was finalized, a datanode sent blockReceived to NN. NN then 
called addStoredBlock which triggered the error below. DataNode 127.0.0.1:51021 
did has a valid replica with a length of 63, but was wrongly marked as corrupt. 
    WARN  namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2791)) - 
Inconsistent size for block blk_2248817250507458558_1011 reported from 
127.0.0.1:51024 current size is 6 reported size is 63
    WARN  namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2816)) - Mark 
existing replica blk_2248817250507458558_1011from 127.0.0.1:51021 as corrupt 
because its length is shorter than the new one.




> FSNameSystem#addStoredBlock does not handle inconsistent block length 
> correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as 
> corrupt if the new replica's length is inconsistent with NN recorded block 
> length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be 
> marked as corrupt if its length is inconsistent (no matter shorter or longer) 
> with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter 
> than the NN recorded block length, the new replica could be marked as 
> corrupt; if the new replica's length is longer, NN should update its recorded 
> block length. But it should not mark existing replicas as corrupt. This is 
> because NN recorded length for an under construction block does not 
> accurately match the block length on datanode disk. NN should not judge an 
> under construction replica to be corrupt by looking at the inaccurate 
> information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Reply via email to