Gargi-jais11 opened a new pull request, #9465:
URL: https://github.com/apache/ozone/pull/9465
## What changes were proposed in this pull request?
- EstimatedBytesToMoved and EstimatedTimeLeft should not be shown up if no
container movement happens.
- Improve threshold validation error message. When running the DiskBalancer
update command with a threshold value of 100.0, the operation fails on all
datanodes with the following error:
```
bash> ozone admin datanode diskbalancer update -t 100.0
--in-service-datanodes
Error on node [DN-1]: Threshold must be a percentage(double) in the range 0
to 100.
```
A threshold of 0 means any deviation from ideal usage (even 0.01%) triggers
container movement
This leads to excessive and continuous balancing operations and results in
unnecessary I/O overhead and resource consumption
A Threshold value can never be 100.0% as it would mean allow moving 100% of
a disk's contents, effectively emptying one disk.
Suggested improvement:
Rather the error message should clarify that 0 and 100 is excluded. The
validation is being updated to exclude 0, requiring threshold to be in
the range (0, 100) exclusive.
new error msg:
```
Error on node [DN-1]: Threshold must be a percentage(double) in the range 0
to 100 both exclusive.
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14110
## How was this patch tested?
Added check for estimatedBytes in unit test `TestDiskBalancerService`.
Tested manually:
before patch:
```
bash-5.1$ ozone admin datanode diskbalancer status --in-service-datanodes
Status result:
Datanode Status Threshold(%)
BandwidthInMB Threads StopAfterDiskEven SuccessMove FailureMove
BytesMoved(MB) EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-5.ozone_default RUNNING 0.0001 10
5 false 0 0 0
638 2
ozone-datanode-3.ozone_default RUNNING 0.0001 10
5 false 0 0 0
1 1
ozone-datanode-4.ozone_default RUNNING 0.0001 10
5 false 0 0 0
1 1
ozone-datanode-2.ozone_default RUNNING 0.0001 10
5 false 0 0 0
698 2
ozone-datanode-1.ozone_default RUNNING 0.0001 10
5 false 0 0 0
3 1
Note: Estimated time left is calculated based on the estimated bytes to move
and the configured disk bandwidth.
```
After code chnages output fixed:
```
bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode VolumeDensity
ozone-datanode-2.ozone_default 8.413243594594944E-4
ozone-datanode-5.ozone_default 8.296842069073773E-4
ozone-datanode-3.ozone_default 7.682500684380311E-4
ozone-datanode-1.ozone_default 7.585499413112762E-4
ozone-datanode-4.ozone_default 7.507898396098833E-4
bash-5.1$ ozone admin datanode diskbalancer status --in-service-datanodes
Status result:
Datanode Status Threshold(%)
BandwidthInMB Threads StopAfterDiskEven SuccessMove FailureMove
BytesMoved(MB) EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-1.ozone_default RUNNING 0.0001 10
5 false 0 0 0
0 0
ozone-datanode-4.ozone_default RUNNING 0.0001 10
5 false 0 0 0
0 0
ozone-datanode-3.ozone_default RUNNING 0.0001 10
5 false 0 0 0
0 0
ozone-datanode-5.ozone_default RUNNING 0.0001 10
5 false 0 0 0
0 0
ozone-datanode-2.ozone_default RUNNING 0.0001 10
5 false 0 0 0
0 0
Note: Estimated time left is calculated based on the estimated bytes to move
and the configured disk bandwidth.
```
Threshold error output:
```
bash-5.1$ ozone admin datanode diskbalancer start -t 0 --in-service-datanodes
Error on node [172.18.0.11:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.10:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.8:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.9:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.7:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Failed to start DiskBalancer on nodes: [172.18.0.11:19864,
172.18.0.10:19864, 172.18.0.8:19864, 172.18.0.9:19864, 172.18.0.7:19864]
bash-5.1$ ozone admin datanode diskbalancer start -t 100
--in-service-datanodes
Error on node [172.18.0.11:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.10:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.8:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.9:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Error on node [172.18.0.7:19864]: Threshold must be a percentage(double) in
the range 0 to 100 both exclusive.
Failed to start DiskBalancer on nodes: [172.18.0.11:19864,
172.18.0.10:19864, 172.18.0.8:19864, 172.18.0.9:19864, 172.18.0.7:19864]
bash-5.1$ ozone admin datanode diskbalancer start -t 0.001
--in-service-datanodes
Started DiskBalancer on all IN_SERVICE nodes.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]