sreejasahithi opened a new pull request, #9611: URL: https://github.com/apache/ozone/pull/9611
## What changes were proposed in this pull request? This PR provides an option `--all` to show the safemode status of each SCM node in the cluster. If verbose, It also provides the status of each safemode exit rule for each SCM node. This PR also fixes the bug stated in [HDDS-13832](https://issues.apache.org/jira/browse/HDDS-13832) where when `--scm` option is used in HA it always shows the status of the leader SCM and silently ignores the node specified via the option. ## What is the link to the Apache JIRA [HDDS-14108](https://issues.apache.org/jira/browse/HDDS-14108) ## How was this patch tested? This patch was tested locally in a docker ozone-ha cluster: ``` bash-5.1$ ozone admin safemode status --all --verbose Service ID: scmservice scm1:9860 [scm1]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); scm2:9860 [scm2]: IN SAFE MODE validated:false, DataNodeSafeModeRule, registered datanodes (=0) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); scm3:9860 [scm3]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); bash-5.1$ ozone admin safemode status --all --verbose Service ID: scmservice scm1:9860 [scm1]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); scm2:9860 [scm2]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); scm3:9860 [scm3]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); ``` When one of the SCM node is down : ``` bash-5.1$ ozone admin safemode status --all --verbose Service ID: scmservice scm1:9860 [scm1]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); scm2:9860 [scm2]: ERROR: Failed to get safe mode status from SCM node scm2 scm3:9860 [scm3]: OUT OF SAFE MODE validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required datanodes (=1) validated:true, RatisContainerSafeModeRule, 100.00% of [RATIS] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines (=0) >= healthyPipelineThresholdCount (=0) validated:true, StateMachineReadyRule, Refreshed SCM State Machine after leader ready: true validated:true, OneReplicaPipelineSafeModeRule, reported Ratis/THREE pipelines with at least one datanode (=0) >= threshold (=0) validated:true, ECContainerSafeModeRule, 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >= safeModeCutoff (=0.99); ``` ``` bash-5.1$ ozone admin safemode status --scm=scm2:9860 Service ID: scmservice scm2:9860 [scm2]: ERROR: Failed to get safe mode status from SCM node scm2 ``` Green CI : https://github.com/sreejasahithi/ozone/actions/runs/20842284515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
