Wei-Chiu Chuang created HDFS-11155:
--------------------------------------
Summary: VolumeScanner should report the latest generation stamp
of a bad replica
Key: HDFS-11155
URL: https://issues.apache.org/jira/browse/HDFS-11155
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.7.4
Environment: CDH5.7.2
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
HDFS-10512 fixed a race condition that caused VolumeScanner to terminate
abruptly when a corrupt replica is detected. However, when a corrupt replica is
detected, VolumeScanner still reports the old replica generation stamp to the
NN. NN then directs DN to remove the older replica, but because the generation
stamp is updated, DN can not find it, so corrupt replica remains corrupt.
NN's log shows something similar to the following:
{quote}
2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_1077571736 added as corrupt on
192.168.168.58:50010 by /192.168.168.58 because client machine reported it
2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK* invalidateBlock:
blk_1077571736_3991953(stored=blk_1077571736_3992018) on 192.168.168.58:50010
{quote}
The DN's log has these:
{noformat}
2016-11-17 21:08:04,815 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Appending
to FinalizedReplica, blk_1077571736_3991953, FINALIZED
getNumBytes() = 39061752
getBytesOnDisk() = 39061752
getVisibleLength()= 39061752
getVolume() = /data/3/dfs/dn/current
getBlockFile() =
/data/3/dfs/dn/current/BP-1092022411-192.168.168.55-1474407949037/current/finalized/subdir58/subdir112/blk_1077571736
2016-11-17 21:08:09,158 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed to
delete replica blk_1077571736_3991953: ReplicaInfo not found.
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]