siddhantsangwan opened a new pull request, #5794:
URL: https://github.com/apache/ozone/pull/5794

   ## What changes were proposed in this pull request?
   A `QUASI_CLOSED` container may have some `UNHEALTHY` replicas with the same 
sequence id as the container, while there are no healthy replicas with the 
correct sequence id. Such `UNHEALTHY` replicas cannot be deleted and must be 
kept around.
   
   If the DN hosting such an `UNHEALTHY` replica is put in decommission, then 
decommission will stay blocked because the `UNHEALTHY` cannot be lost, but at 
the same time RM currently does nothing about it. We try to do something about 
these vulnerable `UNHEALTHY` replicas in this PR so that decommission can be 
successful.
   
   Changes introduced:
   1. A new handler, `VulnerableUnhealthyReplicasHandler`, leverages the 
existing `replicaCount.getVulnerableUnhealthyReplicas` API to find such 
`UNHEALTHY` replicas. If found, the container is marked as under replicated and 
added to the under replication queue.
   2. The under replicated container is then handled in 
`RatisUnderReplicationHandler`. It tries to find a new target DN for each 
`UNHEALTHY` replica and sends replicate commands. The logic is similar to what 
we have already done for legacy RM. Some additional changes were required to 
correctly find out the used and excluded nodes to pass into the placement 
policy API for finding target DNs.
   3. Changes to the decommission monitor so that both RMs use the 
`replicaSet.isHealthyEnoughForOffline` API. 
   
   The third point above basically solves [ReplicationManager: Unhealthy 
replicas of a sufficiently replicated container can block 
decommissioning](https://issues.apache.org/jira/browse/HDDS-9383). If required, 
this can be split off into its own PR since this one is quite large.
   
   Need to add some more tests to `TestRatisUnderReplicationHandler`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-9592
   
   ## How was this patch tested?
   
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to