[
https://issues.apache.org/jira/browse/HDFS-11155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang resolved HDFS-11155.
------------------------------------
Resolution: Not A Problem
It turns out the symptom described in this jira is part of HDFS-11160, which is
the root cause of the symptom. So close this in order to concentrate my fix on
HDFS-11160.
> VolumeScanner should report the latest generation stamp of a bad replica
> ------------------------------------------------------------------------
>
> Key: HDFS-11155
> URL: https://issues.apache.org/jira/browse/HDFS-11155
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.4
> Environment: CDH5.7.3
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
>
> HDFS-10512 fixed a race condition that caused VolumeScanner to terminate
> abruptly when a corrupt replica, which is being updated, is detected.
> However, when such a corrupt replica is detected, VolumeScanner still reports
> the old replica generation stamp to the NN. NN then directs DN to remove the
> older replica. Because the generation stamp is updated, DN can not find it,
> so corrupt replica remains corrupt.
> NN's log shows something similar to the following:
> {quote}
> 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: blk_1077571736 added as corrupt on
> 192.168.168.58:50010 by /192.168.168.58 because client machine reported it
> 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK* invalidateBlock:
> blk_1077571736_3991953(stored=blk_1077571736_3992018) on 192.168.168.58:50010
> {quote}
> The DN's log has these:
> {noformat}
> 2016-11-17 21:08:04,815 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Appending to FinalizedReplica, blk_1077571736_3991953, FINALIZED
> getNumBytes() = 39061752
> getBytesOnDisk() = 39061752
> getVisibleLength()= 39061752
> getVolume() = /data/3/dfs/dn/current
> getBlockFile() =
> /data/3/dfs/dn/current/BP-1092022411-192.168.168.55-1474407949037/current/finalized/subdir58/subdir112/blk_1077571736
> 2016-11-17 21:08:09,158 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed
> to delete replica blk_1077571736_3991953: ReplicaInfo not found.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]