[ 
https://issues.apache.org/jira/browse/HDFS-11155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11155:
-----------------------------------
    Environment: CDH5.7.3  (was: CDH5.7.2)

> VolumeScanner should report the latest generation stamp of a bad replica
> ------------------------------------------------------------------------
>
>                 Key: HDFS-11155
>                 URL: https://issues.apache.org/jira/browse/HDFS-11155
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.4
>         Environment: CDH5.7.3
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>
> HDFS-10512 fixed a race condition that caused VolumeScanner to terminate 
> abruptly when a corrupt replica is detected. However, when a corrupt replica 
> is detected, VolumeScanner still reports the old replica generation stamp to 
> the NN. NN then directs DN to remove the older replica, but because the 
> generation stamp is updated, DN can not find it, so corrupt replica remains 
> corrupt.
> NN's log shows something similar to the following:
> {quote}
> 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1077571736 added as corrupt on 
> 192.168.168.58:50010 by /192.168.168.58  because client machine reported it
> 2016-11-17 21:08:05,350 INFO BlockStateChange: BLOCK* invalidateBlock: 
> blk_1077571736_3991953(stored=blk_1077571736_3992018) on 192.168.168.58:50010
> {quote}
> The DN's log has these:
> {noformat}
> 2016-11-17 21:08:04,815 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Appending to FinalizedReplica, blk_1077571736_3991953, FINALIZED
>   getNumBytes()     = 39061752
>   getBytesOnDisk()  = 39061752
>   getVisibleLength()= 39061752
>   getVolume()       = /data/3/dfs/dn/current
>   getBlockFile()    = 
> /data/3/dfs/dn/current/BP-1092022411-192.168.168.55-1474407949037/current/finalized/subdir58/subdir112/blk_1077571736
> 2016-11-17 21:08:09,158 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed 
> to delete replica blk_1077571736_3991953: ReplicaInfo not found.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to