[
https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287446#comment-13287446
]
Uma Maheswara Rao G commented on HDFS-3157:
-------------------------------------------
I think you have to handle one more case:
{code}
// Add replica to the data-node if it is not already there
node.addBlock(storedBlock);
// Add this replica to corruptReplicas Map
corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason);
if (countNodes(storedBlock).liveReplicas() >= bc.getReplication()) {
// the block is over-replicated so invalidate the replicas immediately
invalidateBlock(storedBlock, node);
} else if (namesystem.isPopulatingReplQueues()) {
// add the block to neededReplication
updateNeededReplications(storedBlock, -1, 0);
}
{code}
Here you are adding storedBlock which is ported genstamp (assume genstamp is 1).
When invalidateBlock, it will try to remove newer genstamp from node because
blockMap#removeNode will lookup the block again from blockMap.
{code}
if (!blocksMap.removeNode(block, node)) {
if(NameNode.stateChangeLog.isDebugEnabled()) {
NameNode.stateChangeLog.debug("BLOCK* removeStoredBlock: "
+ block + " has already been removed from node " + node);
}
return;
}
{code}
how about adding the block which is present in blockMap? so, that block can be
removed successfully when it calls blocksMap.removeNode
> Error in deleting block is keep on coming from DN even after the block report
> and directory scanning has happened
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3157
> URL: https://issues.apache.org/jira/browse/HDFS-3157
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.0, 0.24.0
> Reporter: J.Andreina
> Assignee: Ashish Singhi
> Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch,
> HDFS-3157.patch, HDFS-3157.patch, HDFS-3157.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec"
> 300,"dfs.datanode.directoryscan.interval" 1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which
> replication happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other
> datanode.
> Then at the NN side the following cmd is issued to DN from which the block is
> deleted
> -------------------------------------------------------------------------------------
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: duplicate requested for
> blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX
> because reported RBW replica with genstamp 1002 does not match COMPLETE
> block's genstamp in block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> Removing block blk_2903555284838653156_1003 from neededReplications as it has
> enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception
> occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Unexpected error trying to delete block blk_2903555284838653156_1003.
> BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Error processing datanode Command
> java.io.IOException: Error in deleting blocks.
> at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira