[ 
https://issues.apache.org/jira/browse/HDFS-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243286#comment-13243286
 ] 

Uma Maheswara Rao G commented on HDFS-3162:
-------------------------------------------

I don't think this problem because of append usage.

Looks like this is a race between markBlockAsCorrupt and 
processOverReplicatedBlocks.

1) NN detects over replicated block and added to invalidates list for DNn.
2) Before processing invalidates list, BlockScanner found that block corrupted 
in DNn and reported to NN.
3) Before acquiring lock, Invalidates list got processed and removed the block 
from blocksMap for DNn.
4) Now markBlockAsCorrupt started processing.

// Add this replica to corruptReplicas Map 
      corruptReplicas.addToCorruptReplicasMap(storedBlockInfo, node);
      if (countNodes(storedBlockInfo).liveReplicas()>inode.getReplication()) {
        // the block is over-replicated so invalidate the replicas immediately
        invalidateBlock(storedBlockInfo, node);
      } else {
        // add the block to neededReplication 
        updateNeededReplications(storedBlockInfo, -1, 0);
      }

since it found the enough replication and invalidateBlock. It will try to 
remove the storedBlock if line Replicas are more than one.
This call will just return, because it was already removed blocksMap.

But it was already added to corruptReplicas Map(shown in the above peice of 
code).

So, now the counts of corruptReplicas map and blockMap are different about 
corrupt replicas.

Mostly this issue exists only on branch-1.

I think this problem already addressed in Trunk.

code from trunk.

// Add replica to the data-node if it is not already there
    node.addBlock(storedBlock);

    // Add this replica to corruptReplicas Map
    corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason);
    if (countNodes(storedBlock).liveReplicas() >= inode.getReplication()) {
      // the block is over-replicated so invalidate the replicas immediately
      invalidateBlock(storedBlock, node);
    }

see the first line above. If the block is not already there, it is adding to 
it. I think this should have solved the problem in trunk.

                
> BlockMap's corruptNodes count and CorruptReplicas map count is not matching.
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-3162
>                 URL: https://issues.apache.org/jira/browse/HDFS-3162
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.0.0
>            Reporter: suja s
>            Assignee: Uma Maheswara Rao G
>            Priority: Minor
>             Fix For: 1.0.3
>
>
> Even after invalidating the block, continuosly below log is coming
>  
> Inconsistent number of corrupt replicas for blk_1332906029734_1719blockMap 
> has 0 but corrupt replicas map has 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to