Ashish Kumar created HDDS-15535:
-----------------------------------
Summary: Container Balancer should validate configuration and
report startup failures to user
Key: HDDS-15535
URL: https://issues.apache.org/jira/browse/HDDS-15535
Project: Apache Ozone
Issue Type: Task
Reporter: Ashish Kumar
When we start balancer from CLI it shows started and then internally it
silently fails.
Instead if there are such failure it should interactively shown in UI to
correct the command.
Example: When cluster has 10 nodes and we run balancer with below config
{code:java}
ozone admin containerbalancer start --balancing-iteration-interval-minutes=15
--max-datanodes-percentage-to-involve-per-iteration=10
--max-size-entering-target-in-gb=50 --iterations=10
--max-size-leaving-source-in-gb=50 --max-size-to-move-per-iteration-in-gb=300
--threshold=5
Container Balancer started successfully. {code}
CLI output:Container Balancer started successfully.
However, in a cluster with 10 datanodes:
max-datanodes-percentage-to-involve-per-iteration=10
10% of 10 datanodes = 1 datanode
Container balancing requires both source and target datanodes to perform a
move. With only one datanode allowed to participate, balancing cannot proceed.
The failure is only visible in SCM logs:
{code:java}
2026-06-10 06:49:56,159 DEBUG
[node2-ContainerBalancerTask-2]-org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask:
Approaching max datanodes to involve limit. 0 datanodes have already been
selected for balancing and the limit is 1. Only already selected targets can be
selected as targets now.
------
2026-06-10 06:49:56,178 INFO
[node2-ContainerBalancerTask-2]-org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask:
Result of this iteration of Container Balancer: CAN_NOT_BALANCE_ANY_MORE{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]