Gargi-jais11 opened a new pull request, #10022:
URL: https://github.com/apache/ozone/pull/10022
## What changes were proposed in this pull request?
Currently DiskBalancer only supports **CLOSED containers** to be moved but
if user wants to also move quasi closed containers then we should support that
as well.
**1st Improvement:**
Below scenario can happen in real, so Disk balancer can only attempt those
10 CLOSED containers. Even if it moves all 10 successfully, disk A might drop
only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers are the
real bulk occupying the space and disk balancer is completely blind to them.
```
Disk A: 90% utilized
├── 600 containers: QUASI_CLOSED
└── 10 containers: CLOSED
Disk B: 10% utilized
Disk C: 11% utilized
```
Added a config `hdds.datanode.disk.balancer.include.non.standard.containers`
**default=false**. If true, balancer include non-standard states, i.e,
QUASI_CLOSED. So both CLOSED and QUASI_CLOSED state containers are eligible for
move. If false (default), balancer only moves CLOSED containers.
**2nd Improvement:**
We need to add debug logs for chooseContainer method as user might not
understand why if they have over and under utilised volume still container is
not moved. This parts needs more clarification. Because I see in escalation it
helped a lot with balancer debug logs for container not choose to identify what
state the container or volume was in.
I suggest adding these logs for container not choose :
```
// UsedBytes less than 0
LOG.debug("Skipping container {} from volume {}: bytes used is {}",
containerId, src.getStorageDir().getPath(), containerData.getBytesUsed());
// Skip containers already in progress
LOG.debug("Skipping container {} from volume {}: disk balancer move already
in progress", containerId, src.getStorageDir().getPath());
// only closed and quasi closed containers should be moved
LOG.debug("Skipping container {} from volume {}: state is {}. Requires {}",
containerId, src.getStorageDir().getPath(), containerData.getState(),
phase.getEligibilityCriteria());
// skipping container move as its size is more than destination available
space.
LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination
{} usable space {}B, containerId, containerSize, src.getStorageDir().getPath(),
dst.getStorageDir().getPath(), usableSpace);
// skipping container move as it will make dest more utilised after movement.
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would
result in utilization {} exceeding upper threshold {}", containerId,
containerSize, src.getStorageDir().getPath(), dst.getStorageDir().getPath(),
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14926
## How was this patch tested?
Added unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]