siddhantsangwan opened a new pull request, #5261: URL: https://github.com/apache/ozone/pull/5261
## What changes were proposed in this pull request? Problem: The legacy replication manager currently resolves mismatched replicas (those whose replica state do not match SCM's container state) by 1. Replicating the matching replicas until they are fully replicated. 2. Deleting the mismatched replicas. This approach does not work when LRM is presented with the following small cluster situation: SCM state: CLOSED. 5 datanodes in the cluster. Replica states: CLOSED CLOSED QUASI QUASI QUASI. LRM will not make progress because there is no datanode to add a closed replica to that does not already have a replica. Changes proposed: Try to delete an unhealthy replica (UNHEALTHY or QUASI_CLOSED) to free up a datanode for a healthy replica. We prefer deleting a replica with less sequence id than the container's. If the container is QUASI_CLOSED, then the replica to be deleted should not have a unique origin node id. Also, this replica should be on a healthy, in-service node. We do this only if there isn't a pending delete, if there are at least 4 replicas, and if there is at least one replica which matches the container's lifecycle state. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-9257 ## How was this patch tested? Yet to test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
