[PR] HDDS-10462. Fail Datanode Decommission Early [ozone]

via GitHub Mon, 11 Mar 2024 23:35:36 -0700


Tejaskriya opened a new pull request, #6367:
URL: https://github.com/apache/ozone/pull/6367


   ## What changes were proposed in this pull request?
   
   Many users test out Ozone in small clusters of 15 Datanodes or less. If a 15 
DN cluster has some EC 10-4 containers, for example, then it's not possible to 
decommission more than 1 Datanode, because EC 10-4 requires at least 14 
Datanodes. If someone tries to decommission 2 Datanodes, the decommissioning 
process is designed such that it will keep looping and checking every 30 
seconds whether it's possible to take the two DNs offline. It will never fail, 
even though it's clearly not possible to take the Datanodes offline. 
   In this PR, the decommission is failed early if sufficient datanodes are not 
available based on the maximum replication factor of containers present in the 
cluster. It returns a corresponding DatanodeAdminError. 
   The detailed design doc can be found in the EPIC HDDS-10461
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10462
   
   ## How was this patch tested?
   
   Tested manually in docker cluster. In the following cluster, RATIS-THREE is 
the maximum replication and it has 5 DNS:
   ```
   bash-4.2$ ozone admin datanode decommission ozone-datanode-4 
ozone-datanode-5 ozone-datanode-2
   Started decommissioning datanode(s):
   ozone-datanode-4
   ozone-datanode-5
   ozone-datanode-2
   Error: AllHosts: Sufficient nodes are not available.
   Some nodes could not enter the decommission workflow
   ``` 
   -- will add unit tests and mark it _ready for review_ post that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-10462. Fail Datanode Decommission Early [ozone]

Reply via email to