[jira] [Created] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

Ayush Saxena (Jira) Mon, 02 Mar 2020 04:14:11 -0800

Ayush Saxena created HDFS-15200:
-----------------------------------

             Summary: Delete Corrupt Replica Immediately Irrespective of 
Replicas On Stale Storage 
                 Key: HDFS-15200
                 URL: https://issues.apache.org/jira/browse/HDFS-15200
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ayush Saxena
            Assignee: Ayush Saxena



Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
checks whether any  block replica is on stale storage, if any replica is on 
stale storage, it postpones deletion of the replica.
Here :
{code:java}
   // Check how many copies we have of the block
    if (nr.replicasOnStaleNodes() > 0) {
      blockLog.debug("BLOCK* invalidateBlocks: postponing " +
          "invalidation of {} on {} because {} replica(s) are located on " +
          "nodes with potentially out-of-date block reports", b, dn,
          nr.replicasOnStaleNodes());
      postponeBlock(b.getCorrupted());
      return false;
{code}
 
In case of corrupt replica, we can skip this logic and delete the corrupt 
replica immediately, as a corrupt replica can't get corrected.

One outcome of this behavior presently is namenodes showing different block 
states post failover, as:
If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
mark it for deletion and remove it from corruptReplica's and  
excessRedundancyMap.
If before the deletion of replica, Failover happens.
The standby Namenode will mark all the storages as stale.
Then will start processing IBR's, Now since the replica's would be on stale 
storage, it will skip deletion, and removal from corruptReplica's
Hence both the namenode will show different numbers and different corrupt 
replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

Reply via email to