(ozone-site) branch HDDS-9225-website-v2 updated: HDDS-14327. [Website v2][Docs][Administrator Guide] Reorganize Decommissioning and Maintenance Modes for Datanodes (#241)

sarvekshayr Wed, 21 Jan 2026 00:00:54 -0800

This is an automated email from the ASF dual-hosted git repository.

sarvekshayr pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git



The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this 
push:
     new 43c5b3983 HDDS-14327. [Website v2][Docs][Administrator Guide] 
Reorganize Decommissioning and Maintenance Modes for Datanodes (#241)
43c5b3983 is described below

commit 43c5b39839a5243d9eb598bf54e9ac883a206913
Author: Sarveksha Yeshavantha Raju 
<[email protected]>
AuthorDate: Wed Jan 21 13:30:29 2026 +0530

    HDDS-14327. [Website v2][Docs][Administrator Guide] Reorganize 
Decommissioning and Maintenance Modes for Datanodes (#241)
---
 .../01-datanode-decommission.md}                   | 72 +++-------------------
 .../03-datanodes/02-datanode-maintenance.md        | 69 +++++++++++++++++++++
 .../03-datanodes/README.mdx                        | 17 +++++
 3 files changed, 95 insertions(+), 63 deletions(-)

diff --git 
a/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes.md
 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/01-datanode-decommission.md
similarity index 58%
rename from 
docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes.md
rename to 
docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/01-datanode-decommission.md
index 53e814553..be2a3082f 100644
--- 
a/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes.md
+++ 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/01-datanode-decommission.md
@@ -1,10 +1,8 @@
 ---
-sidebar_label: Datanodes
+sidebar_label: Datanode Decommission
 ---
 
-# Decommissioning and Maintenance Modes for Datanodes
-
-## Datanode Decommission
+# Datanode Decommission
 
 The Datanode decommission is the process that removes the existing Datanode 
from the Ozone cluster while ensuring that the new data should not be written 
to the decommissioned Datanode. When you initiate the process of 
decommissioning a Datanode, Ozone automatically ensures that all the storage 
containers on that Datanode have an additional copy created on another Datanode 
before the decommission completes. So, Datanode will keep running after it has 
been decommissioned and may be used f [...]
 
@@ -45,22 +43,22 @@ ozone admin datanode recommission [-hV] [-id=<scmServiceId>]
        [--scm=<scm>] [<hosts>...]
 ```
 
-### Tuning and Monitoring Decommissioning
+## Tuning and Monitoring Decommissioning
 
 The process of decommissioning a Datanode involves replicating all its 
containers to other Datanodes in the cluster. The speed of this process can be 
tuned, and its progress can be monitored using several configuration properties 
and metrics.
 
-#### Configuration Properties
+### Configuration Properties
 
 Administrators can adjust the following properties in `ozone-site.xml` to 
control the container replication speed during decommissioning. They are 
grouped by the component where they are primarily configured.
 
-##### SCM-Side Properties
+#### SCM-Side Properties
 
 - `hdds.scm.replication.datanode.replication.limit`
   - **Purpose**: Defines the base limit for concurrent replication commands 
that the SCM will *send* to a single Datanode.
   - **Default**: `20`.
   - **Details**: The effective limit for a decommissioning Datanode is this 
value multiplied by `hdds.datanode.replication.outofservice.limit.factor`.
 
-##### Datanode-Side Properties
+#### Datanode-Side Properties
 
 - `hdds.datanode.replication.outofservice.limit.factor`
   - **Purpose**: A multiplier to increase replication capacity for 
`DECOMMISSIONING` or `MAINTENANCE` nodes. This is a key property for tuning 
decommission speed.
@@ -79,11 +77,11 @@ Administrators can adjust the following properties in 
`ozone-site.xml` to contro
 
 By tuning these properties, administrators can balance the decommissioning 
speed against the impact on the cluster's performance.
 
-#### Metrics
+### Metrics
 
 The following metrics can be used to monitor the progress of Datanode 
decommissioning. The names in parentheses are the corresponding Prometheus 
metric names, which may vary slightly depending on the metrics sink 
configuration.
 
-##### SCM-side Metrics (`ReplicationManagerMetrics`)
+#### SCM-side Metrics (`ReplicationManagerMetrics`)
 
 These metrics are available on the SCM and provide a cluster-wide view of the 
replication process. During decommissioning, you should see an increase in 
these metrics. The name in parentheses is the corresponding Prometheus metric 
name.
 
@@ -92,7 +90,7 @@ These metrics are available on the SCM and provide a 
cluster-wide view of the re
 - `replicasCreatedTotal` 
(`replication_manager_metrics_replicas_created_total`): The total number of 
container replicas successfully created.
 - `replicateContainerCmdsDeferredTotal` 
(`replication_manager_metrics_replicate_container_cmds_deferred_total`): The 
number of replication commands deferred because source Datanodes were 
overloaded. If this value is high, it might indicate that the source Datanodes 
(including the decommissioning one) are too busy.
 
-##### Datanode-side Metrics (`MeasuredReplicator` metrics)
+#### Datanode-side Metrics (`MeasuredReplicator` metrics)
 
 These metrics are available on each Datanode. For a decommissioning node, they 
show its activity as a source of replicas. For other nodes, they show their 
activity as targets. The name in parentheses is the corresponding Prometheus 
metric name.
 
@@ -105,55 +103,3 @@ These metrics are available on each Datanode. For a 
decommissioning node, they s
 - `queueTime` (`measured_replicator_queue_time`): The total time tasks spend 
in the replication queue. A high value might indicate the Datanode is 
overloaded.
 
 By monitoring these metrics, administrators can get a clear picture of the 
decommissioning progress and identify potential bottlenecks.
-
-## Datanode Maintenance Mode
-
-Maintenance mode is a feature in Apache Ozone that allows you to temporarily 
take a Datanode offline for maintenance operations (e.g., hardware upgrades, 
software updates) without triggering immediate data replication. Unlike 
decommissioning, which aims to permanently remove a Datanode and its data from 
the cluster, maintenance mode is designed for temporary outages.
-
-While in maintenance mode, a Datanode does not accept new writes but may still 
serve reads, assuming containers are healthy and online. Existing data on the 
Datanode will remain in place, and replication of its data will only be 
triggered if the Datanode remains in maintenance mode beyond a configurable 
timeout period. This allows for planned downtime without unnecessary data 
movement, reducing network overhead and cluster load.
-
-The Datanode transitions through the following operational states during 
maintenance:
-
-1. **IN_SERVICE**: The Datanode is fully operational and participating in data 
writes and reads.
-2. **ENTERING_MAINTENANCE**: The Datanode is transitioning into maintenance 
mode. New writes will be avoided.
-3. **IN_MAINTENANCE**: The Datanode is in maintenance mode. Data will not be 
written to it. If the Datanode remains in this state beyond the configured 
maintenance window, its data will start to be replicated to other Datanodes to 
ensure data durability.
-
-### Command Line Usage
-
-To place a Datanode into maintenance mode, use the `ozone admin datanode 
maintenance` command. You can specify a duration for the maintenance period. If 
no duration is specified, a default duration will be used (this can be 
configured).
-
-To check the current state of the Datanodes, including their operational 
state, you can execute the following command:
-
-```shell
-ozone admin datanode list
-```
-
-To start maintenance mode for one or more Datanodes:
-
-```shell
-ozone admin datanode maintenance [-hV] [-id=<scmServiceId>] [--scm=<scm>] 
[--end=<hours>] [--force] [<hosts>...]
-```
-
-- `<hosts>`: A space-separated list of hostnames or IP addresses of the 
Datanodes to put into maintenance mode.
-- `--end=<hours>`: Optional. Automatically end maintenance after the given 
hours. By default, maintenance must be ended manually.
-- `--force`: Optional. Forcefully try to put the Datanode(s) into maintenance 
mode.
-
-To take a Datanode out of maintenance mode and return it to `IN_SERVICE` 
state, you can use the `recommission` command:
-
-```shell
-ozone admin datanode recommission [-hV] [-id=<scmServiceId>] [--scm=<scm>] 
[<hosts>...]
-```
-
-### Configuration Properties
-
-The following properties, typically set in `ozone-site.xml`, are relevant to 
maintenance mode:
-
-- `hdds.scm.replication.maintenance.replica.minimum`: The minimum number of 
container replicas which must be available for a node to enter maintenance. 
Default value is `2`. If putting a node into maintenance reduces the available 
replicas for any container below this level, the node will remain in the 
`ENTERING_MAINTENANCE` state until a new replica is created.
-- `hdds.scm.replication.maintenance.remaining.redundancy`: The number of 
redundant containers in a group which must be available for a node to enter 
maintenance. Default value is `1`. If putting a node into maintenance reduces 
the redundancy below this value, the node will remain in the 
`ENTERING_MAINTENANCE` state until a new replica is created. For Ratis 
containers, the default value of 1 ensures at least two replicas are online, 
meaning 1 more can be lost without data becoming unavail [...]
-
-### Metrics
-
-The following SCM metrics are relevant to Datanode maintenance mode:
-
-- `DecommissioningMaintenanceNodesTotal`: This metric reports the total number 
of Datanodes that are currently in either decommissioning or maintenance mode.
-- `RecommissionNodesTotal`: This metric reports the total number of Datanodes 
that are currently being recommissioned (i.e., returning to `IN_SERVICE` state 
from either decommissioning or maintenance mode).
diff --git 
a/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/02-datanode-maintenance.md
 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/02-datanode-maintenance.md
new file mode 100644
index 000000000..78a3e697f
--- /dev/null
+++ 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/02-datanode-maintenance.md
@@ -0,0 +1,69 @@
+---
+sidebar_label: Datanode Maintenance Mode
+---
+
+# Datanode Maintenance Mode
+
+Maintenance mode is a feature in Apache Ozone that allows you to temporarily 
take a Datanode offline for maintenance operations (e.g., hardware upgrades, 
software updates) without triggering immediate data replication. Unlike 
decommissioning, which aims to permanently remove a Datanode and its data from 
the cluster, maintenance mode is designed for temporary outages.
+
+While in maintenance mode, a Datanode does not accept new writes but may still 
serve reads, assuming containers are healthy and online. Existing data on the 
Datanode will remain in place, and replication of its data will only be 
triggered if the Datanode remains in maintenance mode beyond a configurable 
timeout period. This allows for planned downtime without unnecessary data 
movement, reducing network overhead and cluster load.
+
+The Datanode transitions through the following operational states during 
maintenance:
+
+1. **IN_SERVICE**: The Datanode is fully operational and participating in data 
writes and reads.
+2. **ENTERING_MAINTENANCE**: The Datanode is transitioning into maintenance 
mode. New writes will be avoided.
+3. **IN_MAINTENANCE**: The Datanode is in maintenance mode. Data will not be 
written to it. If the Datanode remains in this state beyond the configured 
maintenance window, its data will start to be replicated to other Datanodes to 
ensure data durability.
+
+## Command Line Usage
+
+To place a Datanode into maintenance mode, use the `ozone admin datanode 
maintenance` command. You can specify a duration for the maintenance period. If 
no duration is specified, a default duration will be used (this can be 
configured).
+
+To check the current state of the Datanodes, including their operational 
state, you can execute the following command:
+
+```shell
+ozone admin datanode list
+```
+
+To start maintenance mode for one or more Datanodes:
+
+```shell
+ozone admin datanode maintenance [-hV] [-id=<scmServiceId>] [--scm=<scm>] 
[--end=<hours>] [--force] [<hosts>...]
+```
+
+- `<hosts>`: A space-separated list of hostnames or IP addresses of the 
Datanodes to put into maintenance mode.
+- `--end=<hours>`: Optional. Automatically end maintenance after the given 
hours. By default, maintenance must be ended manually.
+- `--force`: Optional. Forcefully try to put the Datanode(s) into maintenance 
mode.
+
+To take a Datanode out of maintenance mode and return it to `IN_SERVICE` 
state, you can use the `recommission` command:
+
+```shell
+ozone admin datanode recommission [-hV] [-id=<scmServiceId>] [--scm=<scm>] 
[<hosts>...]
+```
+
+## Configuration Properties
+
+The following properties, typically set in `ozone-site.xml`, are relevant to 
maintenance mode:
+
+| Property | Default Value | Description |
+| -------- |---------------|-------------|
+| `hdds.scm.replication.maintenance.replica.minimum` | `2` | The minimum 
number of container replicas which must be available for a node to enter 
maintenance. If putting a node into maintenance reduces the available replicas 
for any container below this level, the node will remain in the 
ENTERING_MAINTENANCE state until a new replica is created. |
+| `hdds.scm.replication.maintenance.remaining.redundancy` | `1` | The number 
of redundant containers in a group which must be available for a node to enter 
maintenance. If putting a node into maintenance reduces the redundancy below 
this value, the node will remain in the `ENTERING_MAINTENANCE` state until a 
new replica is created. For Ratis containers, the default value of 1 ensures at 
least two replicas are online, meaning 1 more can be lost without data becoming 
unavailable. For any E [...]
+
+## Metrics
+
+The following SCM metrics are relevant to Datanode decommissioning and 
maintenance across all tracked nodes.
+
+- `DecommissioningMaintenanceNodesTotal`: This metric reports the total number 
of Datanodes that are currently in either decommissioning or maintenance mode.
+- `RecommissionNodesTotal`: This metric reports the total number of Datanodes 
that are currently being recommissioned (i.e., returning to `IN_SERVICE` state 
from either decommissioning or maintenance mode).
+- `PipelinesWaitingToCloseTotal`: This metric reports the total number of 
Datanodes tracked with pipelines waiting to close.
+- `ContainersUnderReplicatedTotal`: This metric reports the total number of 
containers under replicated in tracked nodes.
+- `ContainersUnClosedTotal`: This metric reports the total number of 
containers not fully closed in tracked nodes.
+- `ContainersSufficientlyReplicatedTotal`: This metric reports the total 
number of containers sufficiently replicated in tracked nodes.
+
+The following SCM metrics are relevant to Datanode decommissioning and 
maintenance per node.
+
+- `UnderReplicatedDN`: Number of under-replicated containers for the specific 
host
+- `PipelinesWaitingToCloseDN`: Number of pipelines waiting to close for the 
specific host
+- `SufficientlyReplicatedDN`: Number of sufficiently replicated containers for 
the specific host
+- `UnclosedContainersDN`: Number of containers not fully closed for the 
specific host
+- `StartTimeDN`: Timestamp when decommissioning was started for the specific 
host
diff --git 
a/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/README.mdx
 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/README.mdx
new file mode 100644
index 000000000..e5206dab3
--- /dev/null
+++ 
b/docs/05-administrator-guide/03-operations/03-node-decommissioning-and-maintenance/03-datanodes/README.mdx
@@ -0,0 +1,17 @@
+---
+sidebar_label: Datanode
+---
+
+# Datanode Decommissioning and Maintenance
+
+import DocCardList from '@theme/DocCardList';
+
+## Datanode Decommission
+
+The Datanode Decommission is the process that removes the existing Datanode 
from the Ozone cluster while ensuring that the new data should not be written 
to the decommissioned Datanode. When you initiate the process of 
decommissioning a Datanode, Ozone automatically ensures that all the storage 
containers on that Datanode have an additional copy created on another Datanode 
before the decommission completes. So, Datanode will keep running after it has 
been decommissioned and may be used f [...]
+
+## Datanode Maintenance Mode
+
+The Datanode Maintenance mode is a feature in Apache Ozone that allows you to 
temporarily take a Datanode offline for maintenance operations (e.g., hardware 
upgrades, software updates) without triggering immediate data replication. 
Unlike decommissioning, which aims to permanently remove a Datanode and its 
data from the cluster, maintenance mode is designed for temporary outages.
+
+<DocCardList/>
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(ozone-site) branch HDDS-9225-website-v2 updated: HDDS-14327. [Website v2][Docs][Administrator Guide] Reorganize Decommissioning and Maintenance Modes for Datanodes (#241)

Reply via email to