[
https://issues.apache.org/jira/browse/HDFS-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang resolved HDFS-11019.
------------------------------------
Resolution: Duplicate
I am pretty sure this is a dup of HDFS-9958 . Thanks [~kshukla] for confirming
this!
> Inconsistent number of corrupt replicas if a corrupt replica is reported
> multiple times
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-11019
> URL: https://issues.apache.org/jira/browse/HDFS-11019
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Environment: CDH5.7.2
> Reporter: Wei-Chiu Chuang
> Attachments: HDFS-11019.test.patch
>
>
> While investigating a block corruption issue, I found the following warning
> message in the namenode log:
> {noformat}
> (a client reports a block replica is corrupt)
> 2016-10-12 10:07:37,166 INFO BlockStateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: blk_1073803461 added as corrupt on
> 10.0.0.63:50010 by /10.0.0.62 because client machine reported it
> 2016-10-12 10:07:37,166 INFO BlockStateChange: BLOCK* invalidateBlock:
> blk_1073803461_74513(stored=blk_1073803461_74553) on 10.0.0.63:50010
> 2016-10-12 10:07:37,166 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
> blk_1073803461_74513 to 10.0.0.63:50010
> (another client reports a block replica is corrupt)
> 2016-10-12 10:07:37,728 INFO BlockStateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: blk_1073803461 added as corrupt on
> 10.0.0.63:50010 by /10.0.0.64 because client machine reported it
> 2016-10-12 10:07:37,728 INFO BlockStateChange: BLOCK* invalidateBlock:
> blk_1073803461_74513(stored=blk_1073803461_74553) on 10.0.0.63:50010
> (ReplicationMonitor thread kicks in to invalidate the replica and add a new
> one)
> 2016-10-12 10:07:37,888 INFO BlockStateChange: BLOCK* ask 10.0.0.56:50010 to
> replicate blk_1073803461_74553 to datanode(s) 10.0.0.63:50010
> 2016-10-12 10:07:37,888 INFO BlockStateChange: BLOCK* BlockManager: ask
> 10.0.0.63:50010 to delete [blk_1073803461_74513]
> (the two maps are inconsistent)
> 2016-10-12 10:08:00,335 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent
> number of corrupt replicas for blk_1073803461_74553 blockMap has 0 but
> corrupt replicas map has 1
> {noformat}
> It seems that when a corrupt block replica is reported twice, blockMap
> corrupt and corrupt replica map becomes inconsistent.
> Looking at the log, I suspect the bug is in
> {{BlockManager#removeStoredBlock}}. When a corrupt replica is reported,
> BlockManager removes the block from blocksMap. If the block is already
> removed (that is, the corrupt replica is reported twice), return; Otherwise
> (that is, the corrupt replica is reported the first time), remove the block
> from corruptReplicasMap (The block is added into corruptReplicasMap in
> BlockerManager#markBlockAsCorrupt) Therefore, after the second corruption
> report, the corrupt replica is removed from blocksMap, but the one in
> corruptReplicasMap is not removed.
> I can’t tell what’s the impact that they are inconsistent. But I feel it's a
> good idea to fix it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]