Tejaskriya opened a new pull request, #6367: URL: https://github.com/apache/ozone/pull/6367
## What changes were proposed in this pull request? Many users test out Ozone in small clusters of 15 Datanodes or less. If a 15 DN cluster has some EC 10-4 containers, for example, then it's not possible to decommission more than 1 Datanode, because EC 10-4 requires at least 14 Datanodes. If someone tries to decommission 2 Datanodes, the decommissioning process is designed such that it will keep looping and checking every 30 seconds whether it's possible to take the two DNs offline. It will never fail, even though it's clearly not possible to take the Datanodes offline. In this PR, the decommission is failed early if sufficient datanodes are not available based on the maximum replication factor of containers present in the cluster. It returns a corresponding DatanodeAdminError. The detailed design doc can be found in the EPIC HDDS-10461 ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-10462 ## How was this patch tested? Tested manually in docker cluster. In the following cluster, RATIS-THREE is the maximum replication and it has 5 DNS: ``` bash-4.2$ ozone admin datanode decommission ozone-datanode-4 ozone-datanode-5 ozone-datanode-2 Started decommissioning datanode(s): ozone-datanode-4 ozone-datanode-5 ozone-datanode-2 Error: AllHosts: Sufficient nodes are not available. Some nodes could not enter the decommission workflow ``` -- will add unit tests and mark it _ready for review_ post that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
