[jira] [Updated] (HDDS-14926) Allow QUASI_CLOSED containers in DiskBalancer with improved debug logging for containers

Gargi Jaiswal (Jira) Mon, 30 Mar 2026 02:27:31 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gargi Jaiswal updated HDDS-14926:
---------------------------------
    Description: 
Currently DiskBalancer only supports CLOSED containers to be moved but if user 
wants to also move quasi closed containers then we should support that as well.

*1st Improvement:*
Below scenario can happen in real, so  Disk balancer can only attempt those 
{*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A might 
drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers 
are the real bulk occupying the space and disk balancer is completely blind to 
them.
{code:java}
Disk A: 90% utilized 
├── 600 containers: QUASI_CLOSED 
└── 10 containers: CLOSED 
Disk B: 10% utilized 
Disk C: 11% utilized{code}
 
We can create a config wether to allow to move *quasi_closed* containers or not 
which will be by-default false.
{code:java}
hdds.disk.balancer.include.non.standard.containers=false{code}
When the above config is false it will only move CLOSED state containers and in 
case of above scenario if user want to move QUASI_CLOSED containers as well 
they can set the config to true.
 
*2nd Improvement:*
We need to add debug logs for *chooseContainer* method as user might not 
understand why if they have over and under utilised volume still container is 
not moved. This parts needs more clarification. Because I see in escalation it 
helped a lot with balancer debug logs for contianer not choose to identify what 
state the contianer or volume was in.
I suggest adding these logs for container not choose :

 
{code:java}
// This debug will help us to understand what state container was because of 
which it not moved 

LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is 
eligible)", containerId, src.getStorageID(), containerData.getState());

// It can happen a container size exceeds the usable space on that dn 

LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} " 
+ "usable space {}B", containerId, containerSize, src.getStorageID(), 
dst.getStorageID(), usableSpace);

// this shows would push destination utilization over upper threshold 
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " + 
"result in utilization {} exceeding upper threshold {}", containerId, 
containerSize, src.getStorageID(), dst.getStorageID(), newUtilization, 
upperThreshold);{code}
For message like {color:#de350b}“{color} {color:#de350b}{{No suitable container 
found for destination {}, trying next volume."}}{color}. we can improve it to 
{color:#00875a}“{{{}No suitable CLOSED state and QUASI_CLOSED state container 
found for destination {}, trying next volume.{}}}{color} something like this. 

  was:
Currently DiskBalancer only supports CLOSED containers to be moved but if user 
wants to also move quasi closed containers then we should support that as well.

*1st Improvement:*
Below scenario can happen in real, so  Disk balancer can only attempt those 
{*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A might 
drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers 
are the real bulk occupying the space and disk balancer is completely blind to 
them.
{code:java}
Disk A: 90% utilized 
├── 600 containers: QUASI_CLOSED 
└── 10 containers: CLOSED 
Disk B: 10% utilized 
Disk C: 11% utilized{code}
 
We can create a config wether to allow to move *quasi_closed* containers or not 
which will be by-default false. 
{color:#de350b}{color:#de350b}{{}}{color}{color}
{code:java}
hdds.disk.balancer.include.non.standard.containers=false{code}
{color:#de350b}{{}}{color}When the above config is false it will only move 
CLOSED state containers and in case of above scenario if user want to move 
QUASI_CLOSED containers as well they can set the config to true.
 
*2nd Improvement:*
We need to add debug logs for *chooseContainer* method as user might not 
understand why if they have over and under utilised volume still container is 
not moved. This parts needs more clarification. Because I see in escalation it 
helped a lot with balancer debug logs for contianer not choose to identify what 
state the contianer or volume was in.
I suggest adding these logs for container not choose :
{code:java}
// This debug will help us to understand what state container was because of 
which it not moved 

LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is 
eligible)", containerId, src.getStorageID(), containerData.getState());

// It can happen a container size exceeds the usable space on that dn 

LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} " 
+ "usable space {}B", containerId, containerSize, src.getStorageID(), 
dst.getStorageID(), usableSpace);

// this shows would push destination utilization over upper threshold 
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " + 
"result in utilization {} exceeding upper threshold {}", containerId, 
containerSize, src.getStorageID(), dst.getStorageID(), newUtilization, 
upperThreshold);{code}

For message like “{color:#de350b}{{No suitable container found for destination 
{}, trying next volume."}}{color}. we can improve it to {color:#00875a}“{{{}No 
suitable CLOSED state and QUASI_CLOSED state container found for destination 
{}, trying next volume.{}}}{color} something like this. 


> Allow QUASI_CLOSED containers in DiskBalancer with improved debug logging for 
> containers
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-14926
>                 URL: https://issues.apache.org/jira/browse/HDDS-14926
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Gargi Jaiswal
>            Assignee: Gargi Jaiswal
>            Priority: Major
>
> Currently DiskBalancer only supports CLOSED containers to be moved but if 
> user wants to also move quasi closed containers then we should support that 
> as well.
> *1st Improvement:*
> Below scenario can happen in real, so  Disk balancer can only attempt those 
> {*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A 
> might drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED 
> containers are the real bulk occupying the space and disk balancer is 
> completely blind to them.
> {code:java}
> Disk A: 90% utilized 
> ├── 600 containers: QUASI_CLOSED 
> └── 10 containers: CLOSED 
> Disk B: 10% utilized 
> Disk C: 11% utilized{code}
>  
> We can create a config wether to allow to move *quasi_closed* containers or 
> not which will be by-default false.
> {code:java}
> hdds.disk.balancer.include.non.standard.containers=false{code}
> When the above config is false it will only move CLOSED state containers and 
> in case of above scenario if user want to move QUASI_CLOSED containers as 
> well they can set the config to true.
>  
> *2nd Improvement:*
> We need to add debug logs for *chooseContainer* method as user might not 
> understand why if they have over and under utilised volume still container is 
> not moved. This parts needs more clarification. Because I see in escalation 
> it helped a lot with balancer debug logs for contianer not choose to identify 
> what state the contianer or volume was in.
> I suggest adding these logs for container not choose :
>  
> {code:java}
> // This debug will help us to understand what state container was because of 
> which it not moved 
> LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is 
> eligible)", containerId, src.getStorageID(), containerData.getState());
> // It can happen a container size exceeds the usable space on that dn 
> LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} 
> " + "usable space {}B", containerId, containerSize, src.getStorageID(), 
> dst.getStorageID(), usableSpace);
> // this shows would push destination utilization over upper threshold 
> LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " + 
> "result in utilization {} exceeding upper threshold {}", containerId, 
> containerSize, src.getStorageID(), dst.getStorageID(), newUtilization, 
> upperThreshold);{code}
> For message like {color:#de350b}“{color} {color:#de350b}{{No suitable 
> container found for destination {}, trying next volume."}}{color}. we can 
> improve it to {color:#00875a}“{{{}No suitable CLOSED state and QUASI_CLOSED 
> state container found for destination {}, trying next volume.{}}}{color} 
> something like this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-14926) Allow QUASI_CLOSED containers in DiskBalancer with improved debug logging for containers

Reply via email to