[PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

via GitHub Thu, 08 Jan 2026 23:07:54 -0800


sreejasahithi opened a new pull request, #9611:
URL: https://github.com/apache/ozone/pull/9611


   ## What changes were proposed in this pull request?
   This PR provides an option `--all` to show the safemode status of each SCM 
node in the cluster.
   If verbose, It also provides the status of each safemode exit rule for each 
SCM node.
   
   This PR also fixes the bug stated in 
[HDDS-13832](https://issues.apache.org/jira/browse/HDDS-13832) where when 
`--scm` option is used in HA it always shows the status of the leader SCM and 
silently ignores the node specified via the option.
   
   ## What is the link to the Apache JIRA
   
   [HDDS-14108](https://issues.apache.org/jira/browse/HDDS-14108)
   
   ## How was this patch tested?
   This patch was tested locally in a docker ozone-ha cluster:
   ```
   bash-5.1$ ozone admin safemode status --all --verbose
   Service ID: scmservice
   scm1:9860 [scm1]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   scm2:9860 [scm2]: IN SAFE MODE
   validated:false, DataNodeSafeModeRule, registered datanodes (=0) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   scm3:9860 [scm3]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   
   bash-5.1$ ozone admin safemode status --all --verbose
   Service ID: scmservice
   scm1:9860 [scm1]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   scm2:9860 [scm2]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   scm3:9860 [scm3]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   ```
   When one of the SCM node is down :
   ```
   bash-5.1$ ozone admin safemode status --all --verbose
   Service ID: scmservice
   scm1:9860 [scm1]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   scm2:9860 [scm2]: ERROR: Failed to get safe mode status from SCM node scm2
   scm3:9860 [scm3]: OUT OF SAFE MODE
   validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required 
datanodes (=1)
   validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 
/ 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines 
(=0) >= healthyPipelineThresholdCount (=0)
   validated:true, StateMachineReadyRule, Refreshed SCM State Machine after 
leader ready: true
   validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE 
pipelines with at least one datanode (=0) >= threshold (=0)
   validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) 
with at least N reported replica (=1.00) >= safeModeCutoff (=0.99);
   ```
   ```
   bash-5.1$ ozone admin safemode status --scm=scm2:9860
   Service ID: scmservice
   scm2:9860 [scm2]: ERROR: Failed to get safe mode status from SCM node scm2
   ```
   Green CI : https://github.com/sreejasahithi/ozone/actions/runs/20842284515
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

Reply via email to