Tejaskriya opened a new pull request, #6517: URL: https://github.com/apache/ozone/pull/6517
## What changes were proposed in this pull request? Many users test out Ozone in small clusters of 15 Datanodes or less. If a 15 DN cluster has some EC 10-4 containers, for example, then it's not possible to put more than 2 datanodes into maintenance (`maintenance.remaining.redundancy` = 1 by default), because EC 10-4 requires at least (10+4-1) = 13 Datanodes. If someone tries to move 3 Datanodes to maintenance, the maintenance process is designed such that it will keep looping and checking every 30 seconds whether it's possible to take the two DNs offline. It will never fail, even though it's clearly not possible to take the Datanodes offline. In this PR, the maintenance is failed early if sufficient datanodes are not available based on the maximum replication factor of containers present in the cluster, and the configs `maintenance.remaining.redundancy` and `maintenance.replica.minimum`. It returns a corresponding DatanodeAdminError. The detailed design doc can be found in the EPIC [HDDS-10461](https://issues.apache.org/jira/browse/HDDS-10461) ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-10463 ## How was this patch tested? Unit tests covering edge cases as well have been added to TestNodeDecommissionManager -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
