[ https://issues.apache.org/jira/browse/HDFS-11155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang resolved HDFS-11155. ------------------------------------ Resolution: Not A Problem It turns out the symptom described in this jira is part of HDFS-11160, which is the root cause of the symptom. So close this in order to concentrate my fix on HDFS-11160. > VolumeScanner should report the latest generation stamp of a bad replica > ------------------------------------------------------------------------ > > Key: HDFS-11155 > URL: https://issues.apache.org/jira/browse/HDFS-11155 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.7.4 > Environment: CDH5.7.3 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > > HDFS-10512 fixed a race condition that caused VolumeScanner to terminate > abruptly when a corrupt replica, which is being updated, is detected. > However, when such a corrupt replica is detected, VolumeScanner still reports > the old replica generation stamp to the NN. NN then directs DN to remove the > older replica. Because the generation stamp is updated, DN can not find it, > so corrupt replica remains corrupt. > NN's log shows something similar to the following: > {quote} > 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1077571736 added as corrupt on > 192.168.168.58:50010 by /192.168.168.58 because client machine reported it > 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK* invalidateBlock: > blk_1077571736_3991953(stored=blk_1077571736_3992018) on 192.168.168.58:50010 > {quote} > The DN's log has these: > {noformat} > 2016-11-17 21:08:04,815 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Appending to FinalizedReplica, blk_1077571736_3991953, FINALIZED > getNumBytes() = 39061752 > getBytesOnDisk() = 39061752 > getVisibleLength()= 39061752 > getVolume() = /data/3/dfs/dn/current > getBlockFile() = > /data/3/dfs/dn/current/BP-1092022411-192.168.168.55-1474407949037/current/finalized/subdir58/subdir112/blk_1077571736 > 2016-11-17 21:08:09,158 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed > to delete replica blk_1077571736_3991953: ReplicaInfo not found. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org