This is an automated email from the ASF dual-hosted git repository.

jojochuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 37c57befd HDDS-15525. Update DiskBalancer Doc and Blog with latest 
changes (#462)
37c57befd is described below

commit 37c57befdb1ad7fbd5e250fda2011af0a79d2c1f
Author: Gargi Jaiswal <[email protected]>
AuthorDate: Tue Jun 16 10:04:36 2026 +0530

    HDDS-15525. Update DiskBalancer Doc and Blog with latest changes (#462)
---
 blog/2026-01-29-disk-balancer-preview.md           | 23 ++++++--
 cspell.yaml                                        |  1 +
 .../05-data-balancing/02-disk-balancer.md          | 69 ++++++++++++----------
 3 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/blog/2026-01-29-disk-balancer-preview.md 
b/blog/2026-01-29-disk-balancer-preview.md
index e63ceec62..04377d700 100644
--- a/blog/2026-01-29-disk-balancer-preview.md
+++ b/blog/2026-01-29-disk-balancer-preview.md
@@ -37,14 +37,15 @@ Balancing is local and safe:
 - A scheduler periodically checks for imbalance and dispatches copy-and-import 
tasks.
 - Bandwidth and concurrency are **operator-tunable** to avoid interfering with 
production I/O.
 
-This runs independently on each Datanode. To use it, first enable the feature 
by setting `hdds.datanode.disk.balancer.enabled = true` in `ozone-site.xml` on 
your Datanodes. Once enabled, clients use `ozone admin datanode diskbalancer` 
commands to talk directly to Datanodes, with SCM only used to discover 
IN_SERVICE Datanodes when running batch operations with 
`--in-service-datanodes`.
+This runs independently on each Datanode. The feature can be disabled by 
setting `hdds.datanode.disk.balancer.enabled = false` in `ozone-site.xml` on 
your Datanodes. Once disabled, clients can no longer use `ozone admin datanode 
diskbalancer` commands to balance disks on a datanode.
 
 ## How DiskBalancer Decides What to Move
 
-DiskBalancer uses simple but robust policies to decide **which disks to 
balance** and **which containers to move** (see the design doc for details: 
`diskbalancer.md` in 
[HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713)).
+DiskBalancer uses simple but robust policy to decide **which disks to 
balance** and **which containers to move** (see the design doc for details: 
`diskbalancer.md` in 
[HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713)).
 
-- **Default Volume Choosing Policy**: Picks the most over‑utilized volume as 
the source and the most under‑utilized volume as the destination, based on each 
disk’s **Volume Data Density** and the Datanode’s average utilization.
-- **Default Container Choosing Policy**: Scans containers on the source volume 
and moves only **CLOSED** containers that are not already being moved. To avoid 
repeatedly scanning the same list, it caches container metadata with automatic 
expiry.
+- **Default Container Choosing Policy**: This is the default policy that 
consolidates both volume selection and container selection into a single 
operation. It identifies the most over-utilized volume
+as the source and the most under-utilized volume with sufficient space as the 
destination, then iterates through containers on the source to pick the first 
one that is movable (per `hdds.datanode.disk.balancer.container.states`,
+default **CLOSED** and **QUASI_CLOSED**) and is not already being moved. It 
caches the list of containers for each volume which auto expires after one hour.
 
 These defaults aim to make safe, incremental moves that converge the disks 
toward an even utilization state.
 
@@ -56,13 +57,22 @@ When DiskBalancer moves a container from one disk to 
another on the **same Datan
 2. Transition that copy into a **RECOVERING** state and import it as a new 
container on the destination.
 3. Once import and metadata updates succeed, delete the original CLOSED 
container from the source disk.
 
+```
+D1     ----> C1-CLOSED  --- (5) ---> C1-DELETED
+        |
+        |
+       (1)
+        |
+D2      ----> Temp C1-CLOSED  --- (2) ---> Temp C1-RECOVERING --- (3) ---> 
C1-RECOVERING --- (4) ---> C1-CLOSED
+```
+
 This ensures that data is always consistent: the destination copy is fully 
validated before the original is removed, minimizing risk during balancing.
 
 ## Using Disk Balancer
 
-First, enable the Disk Balancer feature on each Datanode by setting the 
following in `ozone-site.xml`:
+The Disk Balancer has a feature flag which is **by default true** on each 
Datanode and can be disabled by setting the following property in 
`ozone-site.xml` :
 
-- `hdds.datanode.disk.balancer.enabled = true`
+- `hdds.datanode.disk.balancer.enabled = false`
 
 The Disk Balancer CLI supports two command patterns:
 
@@ -121,6 +131,7 @@ The following parameters can be specified during **start** 
or **update configura
 | `--bandwidth-in-mb` | `-b` | `10`          | Maximum bandwidth for 
DiskBalancer per second. |
 | `--parallel-thread` | `-p` | `5`           | Max parallel thread count for 
DiskBalancer. |
 | `--stop-after-disk-even` | `-s` | `true`        | Stop DiskBalancer 
automatically after disk utilization is even. |
+| `--container-states` | `-c` | `CLOSED,QUASI_CLOSED` | Comma-separated list 
of container states that are eligible for moving during balancing. |
 
 ## Benefits for operators
 
diff --git a/cspell.yaml b/cspell.yaml
index 6e0fbeb65..b7264e3d3 100644
--- a/cspell.yaml
+++ b/cspell.yaml
@@ -103,6 +103,7 @@ words:
 - AOS
 - FCQ
 - QoS
+- QUASI_CLOSED
 # Other systems' words
 - savepoints
 - HDDs
diff --git 
a/docs/05-administrator-guide/03-operations/05-data-balancing/02-disk-balancer.md
 
b/docs/05-administrator-guide/03-operations/05-data-balancing/02-disk-balancer.md
index 8c1174cf9..9c3662df1 100644
--- 
a/docs/05-administrator-guide/03-operations/05-data-balancing/02-disk-balancer.md
+++ 
b/docs/05-administrator-guide/03-operations/05-data-balancing/02-disk-balancer.md
@@ -21,9 +21,9 @@ A disk is considered a candidate for balancing if its 
`VolumeDataDensity` exceed
 
 ## Feature Flag
 
-The Disk Balancer feature is introduced with a feature flag. By default, this 
feature is disabled.
+The Disk Balancer feature is introduced with a feature flag. By default, this 
feature is enabled.
 
-The feature can be **enabled** by setting the following property to `true` in 
the `ozone-site.xml` configuration file: `hdds.datanode.disk.balancer.enabled = 
true`
+The feature can be **disabled** by setting the following property to false in 
the `ozone-site.xml` configuration file: `hdds.datanode.disk.balancer.enabled = 
false`.
 
 ## Authentication and Authorization
 
@@ -45,9 +45,12 @@ In secure clusters with Kerberos enabled, the Datanode must 
have its Kerberos pr
   </description>
 </property>
 ```
+:::note
 
-**Note:** Without this configuration, DiskBalancer commands will fail with 
authentication errors in secure clusters. The client uses this principal to 
verify the Datanode's identity when establishing RPC connections.
+ Without this configuration, DiskBalancer commands will fail with 
authentication errors in secure clusters. The client uses this principal to 
verify the
+ Datanode's identity when establishing RPC connections.
 
+:::
 ### Authorization Configuration
 
 Each Datanode performs authorization checks using `OzoneAdmins` based on the 
`ozone.administrators` configuration:
@@ -96,7 +99,11 @@ To allow other users to perform DiskBalancer admin 
operations (start, stop, upda
 
 The DiskBalancer is managed through the `ozone admin datanode diskbalancer` 
command.
 
-**Note:** This command is hidden from the main help message (`ozone admin 
datanode --help`). This is because the feature is currently considered 
experimental and is disabled by default. The command is, however, fully 
functional for those who wish to enable and use the feature.
+:::note
+
+ DiskBalancer is enabled by default on datanodes. Use 
`hdds.datanode.disk.balancer.enabled=false` in `ozone-site.xml` to disable the 
service on datanodes and prevent CLI commands from running.
+
+:::
 
 ### Command Syntax
 
@@ -132,15 +139,16 @@ ozone admin datanode diskbalancer report 
[<datanode-address> ...] [--in-service-
 
 ### Command Options
 
-| Option | Description                                                         
                                                                                
                                                                                
                                                                                
                                                             | Example |
-|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
-| `<datanode-address>` | One or more Datanode addresses as positional 
arguments. Addresses can be:<br />- Hostname (e.g., `DN-1`) - uses default 
CLIENT_RPC port (19864)<br />- Hostname with port (e.g., `DN-1:19864`)<br />- 
IP address (e.g., `192.168.1.10`)<br />- IP address with port (e.g., 
`192.168.1.10:19864`)<br />- Stdin (`-`) - reads Datanode addresses from 
standard input, one per line | `DN-1`<br />`DN-1:19864`<br />`192.168.1.10`<br 
/>`-` |
-| `--in-service-datanodes` | It queries SCM for all IN_SERVICE Datanodes and 
executes the command on all of them.                                            
                                                                                
                                                                                
                                                                                
 | `--in-service-datanodes` |
-| `--json` | Format output as JSON.                                            
                                                                                
                                                                                
                                                                                
                                                               | `--json` |
-| `-t/--threshold-percentage` | Volume density threshold percentage (default: 
10.0). Used with `start` and `update` commands.                                 
                                                                                
                                                                                
                                                                                
   | `-t 5`<br />`--threshold-percentage 5.0` |
-| `-b/--bandwidth-in-mb` | Maximum disk bandwidth in MB/s (default: 10). Used 
with `start` and `update` commands.                                             
                                                                                
                                                                                
                                                                              | 
`-b 20`<br />`--bandwidth-in-mb 50` |
-| `-p/--parallel-thread` | Number of parallel threads (default: 1). Used with 
`start` and `update` commands.                                                  
                                                                                
                                                                                
                                                                              | 
`-p 5`<br />`--parallel-thread 10` |
-| `-s/--stop-after-disk-even` | Stop automatically after disks are balanced 
(default: true). Used with `start` and `update` commands.                       
                                                                                
                                                                                
                                                                                
     | `-s false`<br />`--stop-after-disk-even true` |
+| Option | Description                                                         
                                                                                
                                                                                
                                                                                
                                                             | Example          
                                         |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|
+| `<datanode-address>` | One or more Datanode addresses as positional 
arguments. Addresses can be:<br />- Hostname (e.g., `DN-1`) - uses default 
CLIENT_RPC port (19864)<br />- Hostname with port (e.g., `DN-1:19864`)<br />- 
IP address (e.g., `192.168.1.10`)<br />- IP address with port (e.g., 
`192.168.1.10:19864`)<br />- Stdin (`-`) - reads Datanode addresses from 
standard input, one per line | `DN-1`<br />`DN-1:19864`<br />`192.168.1.10`<br 
/>`-`     |
+| `--in-service-datanodes` | It queries SCM for all IN_SERVICE Datanodes and 
executes the command on all of them.                                            
                                                                                
                                                                                
                                                                                
 | `--in-service-datanodes`                                  |
+| `--json` | Format output as JSON.                                            
                                                                                
                                                                                
                                                                                
                                                               | `--json`       
                                           |
+| `-t/--threshold-percentage` | Volume density threshold percentage (default: 
10.0). Used with `start` and `update` commands.                                 
                                                                                
                                                                                
                                                                                
   | `-t 5`<br />`--threshold-percentage 5.0`                  |
+| `-b/--bandwidth-in-mb` | Maximum disk bandwidth in MB/s (default: 10). Used 
with `start` and `update` commands.                                             
                                                                                
                                                                                
                                                                              | 
`-b 20`<br />`--bandwidth-in-mb 50`                       |
+| `-p/--parallel-thread` | Number of parallel threads (default: 1). Used with 
`start` and `update` commands.                                                  
                                                                                
                                                                                
                                                                              | 
`-p 5`<br />`--parallel-thread 10`                        |
+| `-s/--stop-after-disk-even` | Stop automatically after disks are balanced 
(default: true). Used with `start` and `update` commands.                       
                                                                                
                                                                                
                                                                                
     | `-s false`<br />`--stop-after-disk-even true`             |
+| `-c/--container-states` | Comma-separated container lifecycle state names 
that may be moved between disks. Used with `start` and `update` commands. | `-c 
CLOSED,QUASI_CLOSED` <br /> `--container-states CLOSED` |
 
 ### Examples
 
@@ -150,7 +158,7 @@ ozone admin datanode diskbalancer report 
[<datanode-address> ...] [--in-service-
 # Start DiskBalancer on multiple datanodes
 ozone admin datanode diskbalancer start DN-1 DN-2 DN-3
 
-# Start DiskBalancer on all IN_SERVICE datanodes
+# Start DiskBalancer on all IN_SERVICE and HEALTHY datanodes
 ozone admin datanode diskbalancer start --in-service-datanodes
 
 # Start DiskBalancer with configuration parameters
@@ -171,7 +179,7 @@ ozone admin datanode diskbalancer start DN-1 --json
 # Stop DiskBalancer on multiple datanodes
 ozone admin datanode diskbalancer stop DN-1 DN-2 DN-3
 
-# Stop DiskBalancer on all IN_SERVICE datanodes
+# Stop DiskBalancer on all IN_SERVICE and HEALTHY datanodes
 ozone admin datanode diskbalancer stop --in-service-datanodes
 
 # Stop DiskBalancer with json output
@@ -184,7 +192,7 @@ ozone admin datanode diskbalancer stop DN-1 --json
 # Update multiple parameters
 ozone admin datanode diskbalancer update DN-1 -t 5 -b 50 -p 10
 
-# Update on all IN_SERVICE datanodes
+# Update on all IN_SERVICE and HEALTHY datanodes
 ozone admin datanode diskbalancer update --in-service-datanodes -t 5
 # Or using the long form:
 ozone admin datanode diskbalancer update --in-service-datanodes 
--threshold-percentage 5
@@ -199,7 +207,7 @@ ozone admin datanode diskbalancer update DN-1 -b 50 --json
 # Get status from multiple datanodes
 ozone admin datanode diskbalancer status DN-1 DN-2 DN-3
 
-# Get status from all IN_SERVICE datanodes
+# Get status from all IN_SERVICE and HEALTHY datanodes
 ozone admin datanode diskbalancer status --in-service-datanodes
 
 # Get status as JSON
@@ -212,7 +220,7 @@ ozone admin datanode diskbalancer status 
--in-service-datanodes --json
 # Get report from multiple datanodes
 ozone admin datanode diskbalancer report DN-1 DN-2 DN-3
 
-# Get report from all IN_SERVICE datanodes
+# Get report from all IN_SERVICE and HEALTHY datanodes
 ozone admin datanode diskbalancer report --in-service-datanodes
 
 # Get report as JSON
@@ -223,15 +231,16 @@ ozone admin datanode diskbalancer report 
--in-service-datanodes --json
 
 The DiskBalancer's behavior can be controlled using the following 
configuration properties in `ozone-site.xml`.
 
-| Property | Default                                                           
                               | Purpose                                        
                                                                                
                                           |
-|----------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `hdds.datanode.disk.balancer.enabled` | `false`                              
                                                            | If false, the 
DiskBalancer service on the Datanode is disabled. Configure it to true for 
diskBalancer to be enabled.                                                     
 |
-| `hdds.datanode.disk.balancer.volume.density.threshold.percent` | `10.0`      
                                                                                
     | A percentage (0-100). A Datanode is considered balanced if for each 
volume, its utilization differs from the average Datanode utilization by no 
more than this threshold. |
-| `hdds.datanode.disk.balancer.max.disk.throughputInMBPerSec` | `10`           
                                                                                
  | The maximum bandwidth (in MB/s) that the balancer can use for moving data, 
to avoid impacting client I/O.                                                  
               |
-| `hdds.datanode.disk.balancer.parallel.thread` | `5`                          
                                                                    | The 
number of worker threads to use for moving containers in parallel.              
                                                                                
      |
-| `hdds.datanode.disk.balancer.service.interval` | `60s`                       
                                                                     | The time 
interval at which the Datanode DiskBalancer service checks for imbalance and 
updates its configuration.                                                      
    |
-| `hdds.datanode.disk.balancer.stop.after.disk.even` | `true`                  
                                                                         | If 
true, the DiskBalancer will automatically stop its balancing activity once 
disks are considered balanced (i.e., all volume densities are within the 
threshold).        |
-| `hdds.datanode.disk.balancer.volume.choosing.policy` | 
`org.apache.hadoop.`<br />`ozone.container.` <br />`diskbalancer.policy.` <br 
/>`DefaultVolumeChoosingPolicy`    | The policy class for selecting source and 
destination volumes for balancing.                                              
                                                                                
                      |
-| `hdds.datanode.disk.balancer.container.choosing.policy` | 
`org.apache.hadoop.`<br />`ozone.container.` <br />`diskbalancer.policy.` <br 
/>`DefaultContainerChoosingPolicy` | The policy class for selecting which 
containers to move from a source volume to destination volume.                  
                                                                                
                                                   |
-| `hdds.datanode.disk.balancer.service.timeout` | `300s`                       
                                                                    | Timeout 
for the Datanode DiskBalancer service operations.                               
                                                                                
                                             |
-| `hdds.datanode.disk.balancer.should.run.default` | `false`                   
                                                                       | If the 
balancer fails to read its persisted configuration, this value determines if 
the service should run by default.                                              
                                                                                
             |
+| Property | Default                                                           
                                               | Purpose                        
                                                                                
                                                       |
+|----------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `hdds.datanode.disk.balancer.enabled` | `true`                               
                                                                            | 
If false, the DiskBalancer service on the Datanode is disabled. By default, 
DiskBalancer is enabled on datanodes.                                           
       |
+| `hdds.datanode.disk.balancer.volume.density.threshold.percent` | `10.0`      
                                                                                
                     | A percentage (0-100). A Datanode is considered balanced 
if for each volume, its utilization differs from the average Datanode 
utilization by no more than this threshold. |
+| `hdds.datanode.disk.balancer.max.disk.throughputInMBPerSec` | `10`           
                                                                                
                  | The maximum bandwidth (in MB/s) that the balancer can use 
for moving data, to avoid impacting client I/O.                                 
                            |
+| `hdds.datanode.disk.balancer.parallel.thread` | `5`                          
                                                                                
    | The number of worker threads to use for moving containers in parallel.    
                                                                                
            |
+| `hdds.datanode.disk.balancer.service.interval` | `60s`                       
                                                                                
     | The time interval at which the Datanode DiskBalancer service checks for 
imbalance and updates its configuration.                                        
              |
+| `hdds.datanode.disk.balancer.stop.after.disk.even` | `true`                  
                                                                                
         | If true, the DiskBalancer will automatically stop its balancing 
activity once disks are considered balanced (i.e., all volume densities are 
within the threshold).    |
+| `hdds.datanode.disk.balancer.replica.deletion.delay` | `5m`                  
                                                                                
           | The delay after a container is successfully moved from source 
volume to destination volume before the source container replica is deleted. 
This lazy deletion provides a grace period before failing the read thread 
holding the old container replica. Unit: ns, ms, s, m, h, d. |
+| `hdds.datanode.disk.balancer.container.states` | `CLOSED,QUASI_CLOSED`       
                                                                                
     | Comma-separated container lifecycle state names that may be moved 
between disks (must match enum names exactly, uppercase). Default includes 
**CLOSED** and **QUASI_CLOSED**; extend the list when additional states are 
needed to be balanced. All defined container states which are eligible to move 
**QUASI_CLOSED**, **CLOSED**,  [...]
+| `hdds.datanode.disk.balancer.container.choosing.policy` | 
`org.apache.hadoop.`<br />`ozone.container.` <br />`diskbalancer.policy.` <br 
/>`DefaultContainerChoosingPolicy` | The policy for selecting 
source/destination volumes and which containers to move.                        
                                                                                
                                             |
+| `hdds.datanode.disk.balancer.service.timeout` | `300s`                       
                                                                                
    | Timeout for the Datanode DiskBalancer service operations.                 
                                                                                
                                                       |
+| `hdds.datanode.disk.balancer.should.run.default` | `false`                   
                                                                                
       | If the balancer fails to read its persisted configuration, this value 
determines if the service should run by default.                                
                                                                                
                      |


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to