[
https://issues.apache.org/jira/browse/HDDS-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039069#comment-18039069
]
Sumit Agrawal edited comment on HDDS-13882 at 11/18/25 8:36 AM:
----------------------------------------------------------------
[~ssulav]
This is negative scenario where DNs are not available logically for any write
operation. Rules defined at SCM specify atleast 1 pipeline available for write.
But if this is not available, and need to get out of safemode,
# force exit
# change safe mode rule configuration
This is not an issue and {*}need not be fixed{*}, this is environment issue not
setup with correct configuration to allow this case.
Datanode will be available for readonly if no write is allowed, this is valid
case and need to be healthy.
was (Author: JIRAUSER295412):
[~ssulav]
This is negative scenario where DNs are not available logically for any write
operation. Rules defined at SCM specify atleast 1 pipeline available for write.
But if this is not available, and need to get out of safemode,
# force exit
# change safe mode rule configuration
This is not an issue and {*}need not be fixed{*}, this is environment issue not
setup with correct configuration to allow this case.
> Datanode status as HEALTHY even for NoDiskSpace
> -----------------------------------------------
>
> Key: HDDS-13882
> URL: https://issues.apache.org/jira/browse/HDDS-13882
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 2.0.0
> Reporter: Soumitra Sulav
> Assignee: Siddhant Sangwan
> Priority: Critical
>
>
> Datanode status is shown as HEALTHY.
> Even when the available capacity on each datanode is just 4.5 GB on the
> datanode dir.
> The pipeline create fails as it cannot allocated the minimum 5GB for a
> container.
> {code:java}
> scm@installer-4:~$ ozone admin pipeline create
> Unable to find enough nodes that meet the space requirement of 1073741824
> bytes for metadata and 5368709120 bytes for data in healthy node set. Nodes
> required: 1 Found: 0
> scm@installer-4:~$ df -Th
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/root ext4 7.6G 3.2G 4.5G 42% /
> devtmpfs devtmpfs 1.9G 0 1.9G 0% /dev
> tmpfs tmpfs 1.9G 0 1.9G 0% /dev/shm
> tmpfs tmpfs 382M 944K 381M 1% /run
> tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
> tmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
> /dev/loop0 squashfs 24M 24M 0 100% /snap/amazon-ssm-agent/11321
> /dev/loop1 squashfs 60M 60M 0 100% /snap/core20/2603
> /dev/loop3 squashfs 92M 92M 0 100% /snap/lxd/32669
> /dev/loop2 squashfs 69M 69M 0 100% /snap/core22/2012
> /dev/loop4 squashfs 45M 45M 0 100% /snap/snapd/24672
> /dev/nvme0n1p15 vfat 98M 6.3M 92M 7% /boot/efi
> tmpfs tmpfs 382M 0 382M 0% /run/user/0
> root@installer-4:~# ozone admin datanode list
> Datanode: 920fd52c-9140-46b8-bc19-1c6ddb701cba
> (/default-rack/10.65.157.76/installer-4.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: b7b077dd-7797-4992-9960-753cadeb51bb
> (/default-rack/10.65.147.12/installer-6.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: c2af08f4-4494-46df-81e1-ea0eebc6b150
> (/default-rack/10.65.154.72/installer-9.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: e6b7edc3-eb20-451f-947e-99147e76c188
> (/default-rack/10.65.156.11/installer-7.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: e6dfe2cc-03ec-4747-95cc-a6b4a957bdf2
> (/default-rack/10.65.159.174/installer-8.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: 001e2d76-ef05-493a-bdb6-d9debf1af2ea
> (/default-rack/10.65.144.98/installer-5.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster.
> Datanode: 3261dcbf-7f78-4164-b9e2-b13e03f9a8d2
> (/default-rack/10.65.150.32/installer-10.domain/0 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Related pipelines:
> No pipelines in cluster. {code}
> Due to this SCM always stays in SafeMode even though all prechecks have
> passed.
> {code:java}
> root@installer-4:~# egrep 'SafeMode|Entering startup'
> /opt/ozone/current/logs/ozone-scm-scm-installer-4.domain.log
> 2025-11-04 12:13:54,164 [main] INFO
> org.apache.hadoop.hdds.scm.node.SCMNodeManager: Entering startup safe mode.
> 2025-11-04 12:13:54,295 [main] INFO
> org.apache.hadoop.hdds.scm.safemode.ContainerSafeModeRule: Refreshed
> Containers with one replica threshold count 0, with ec n replica threshold
> count 0.
> 2025-11-04 12:13:54,298 [main] INFO
> org.apache.hadoop.hdds.scm.safemode.HealthyPipelineSafeModeRule: Total
> pipeline count is 0, healthy pipeline threshold count is 0
> 2025-11-04 12:13:54,299 [main] INFO
> org.apache.hadoop.hdds.scm.safemode.OneReplicaPipelineSafeModeRule: Total
> pipeline count is 0, pipeline's with at least one datanode reported threshold
> count is 0
> 2025-11-04 12:13:55,006 [main] INFO org.apache.hadoop.hdds.scm.ha.SCMContext:
> Update SafeModeStatus from SafeModeStatus{safeModeStatus=true,
> preCheckPassed=false} to SafeModeStatus{safeModeStatus=true,
> preCheckPassed=false}.
> 2025-11-04 12:14:00,321
> [b1623451-5346-40b2-b1fc-5faa7c649b83@group-7D6A6F9524D7-StateMachineUpdater]
> INFO org.apache.hadoop.hdds.scm.safemode.HealthyPipelineSafeModeRule:
> Refreshed total pipeline count is 0, healthy pipeline threshold count is 0
> 2025-11-04 12:14:00,321
> [b1623451-5346-40b2-b1fc-5faa7c649b83@group-7D6A6F9524D7-StateMachineUpdater]
> INFO org.apache.hadoop.hdds.scm.safemode.ContainerSafeModeRule: Refreshed
> Containers with one replica threshold count 0, with ec n replica threshold
> count 0.
> 2025-11-04 12:14:00,321
> [b1623451-5346-40b2-b1fc-5faa7c649b83@group-7D6A6F9524D7-StateMachineUpdater]
> INFO org.apache.hadoop.hdds.scm.safemode.OneReplicaPipelineSafeModeRule:
> Refreshed Total pipeline count is 0, pipeline's with at least one datanode
> reported threshold count is 0
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-ContainerRegistrationReportForContainerSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: ContainerSafeModeRule
> rule is successfully validated
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-NodeRegistrationContainerReportForDataNodeSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 1
> DataNodes registered, 1 required.
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-NodeRegistrationContainerReportForDataNodeSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: DataNodeSafeModeRule
> rule is successfully validated
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-NodeRegistrationContainerReportForDataNodeSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: All SCM safe mode pre
> check rules have passed
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-NodeRegistrationContainerReportForDataNodeSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.ha.SCMContext: Update SafeModeStatus from
> SafeModeStatus{safeModeStatus=true, preCheckPassed=false} to
> SafeModeStatus{safeModeStatus=true, preCheckPassed=true}.
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-NodeRegistrationContainerReportForDataNodeSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: trigger a
> one-shot run on scm1-RatisPipelineUtilsThread.
> 2025-11-04 12:14:35,593
> [scm1-EventQueue-PipelineReportForOneReplicaPipelineSafeModeRule] INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager:
> AtleastOneDatanodeReportedRule rule is successfully validated {code}
> {code:java}
> root@installer-4:~# ozone admin safemode status --verbose
> SCM is in safe mode.
> validated:true, DataNodeSafeModeRule, registered datanodes (=1) >= required
> datanodes (=1)
> validated:true, HealthyPipelineSafeModeRule, healthy Ratis/THREE pipelines
> (=0) >= healthyPipelineThresholdCount (=0)
> validated:true, ContainerSafeModeRule, 100.00% of [Ratis] Containers(0 / 0)
> with at least one reported replica (=1.00) >= safeModeCutoff (=0.99);
> 100.00% of [EC] Containers(0 / 0) with at least N reported replica (=1.00) >=
> safeModeCutoff (=0.99);
> validated:true, AtleastOneDatanodeReportedRule, reported Ratis/THREE
> pipelines with at least one datanode (=0) >= threshold (=0)
> root@installer-4:~# ozone admin scm roles
> installer-4.vpc.cloudera.com:9894:LEADER:b1623451-5346-40b2-b1fc-5faa7c649b83:10.65.157.76
> installer-6.vpc.cloudera.com:9894:FOLLOWER:02a1f01e-0f38-427c-959d-c7733a07d106:10.65.147.12
> installer-5.vpc.cloudera.com:9894:FOLLOWER:738a934b-6466-46d1-b2e2-b10dcdaa45ec:10.65.144.98
> root@installer-4:~# ozone admin om roles
> om1 : FOLLOWER (installer-4.vpc.cloudera.com)
> om2 : FOLLOWER (installer-5.vpc.cloudera.com)
> om3 : LEADER (installer-6.vpc.cloudera.com){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]