[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

Hudson (Jira) Thu, 19 Mar 2020 10:32:25 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062798#comment-17062798
 ]


Hudson commented on HDFS-15200:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18068 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18068/])
HDFS-15200. Delete Corrupt Replica Immediately Irrespective of Replicas 
(ayushsaxena: rev f9bb2a8cc580f7bebbd890ad38e772f23bcb65f7)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptionWithFailover.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-15200
>                 URL: https://issues.apache.org/jira/browse/HDFS-15200
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: HDFS-15200-01.patch, HDFS-15200-02.patch, 
> HDFS-15200-03.patch, HDFS-15200-04.patch, HDFS-15200-05.patch
>
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>    // Check how many copies we have of the block
>     if (nr.replicasOnStaleNodes() > 0) {
>       blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>           "invalidation of {} on {} because {} replica(s) are located on " +
>           "nodes with potentially out-of-date block reports", b, dn,
>           nr.replicasOnStaleNodes());
>       postponeBlock(b.getCorrupted());
>       return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

Reply via email to