[
https://issues.apache.org/jira/browse/HDDS-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gargi Jaiswal updated HDDS-14926:
---------------------------------
Description:
Currently DiskBalancer only supports CLOSED containers to be moved but if user
wants to also move quasi closed containers then we should support that as well.
*1st Improvement:*
Below scenario can happen in real, so Disk balancer can only attempt those
{*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A might
drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers
are the real bulk occupying the space and disk balancer is completely blind to
them.
{code:java}
Disk A: 90% utilized
├── 600 containers: QUASI_CLOSED
└── 10 containers: CLOSED
Disk B: 10% utilized
Disk C: 11% utilized{code}
Added a config `hdds.datanode.disk.balancer.include.non.standard.containers`
**default=false**. If true, balancer include non-standard states, i.e,
QUASI_CLOSED. So both CLOSED and QUASI_CLOSED state containers are eligible for
move. If false (default), balancer only moves CLOSED containers.
*2nd Improvement:*
We need to add debug logs for *chooseContainer* method as user might not
understand why if they have over and under utilised volume still container is
not moved. This parts needs more clarification. Because I see in escalation it
helped a lot with balancer debug logs for container not choose to identify what
state the contianer or volume was in.
I suggest adding these logs for container not choose :
{code:java}
// This debug will help us to understand what state container was because of
which it not moved
LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is
eligible)", containerId, src.getStorageID(), containerData.getState());
// It can happen a container size exceeds the usable space on that dn
LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} "
+ "usable space {}B", containerId, containerSize, src.getStorageID(),
dst.getStorageID(), usableSpace);
// this shows would push destination utilization over upper threshold
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " +
"result in utilization {} exceeding upper threshold {}", containerId,
containerSize, src.getStorageID(), dst.getStorageID(), newUtilization,
upperThreshold);{code}
For message like “{color:#de350b}{{No suitable container found for destination
{}, trying next volume."}}{color}. we can improve it to {color:#00875a}“{{{}No
suitable CLOSED state and QUASI_CLOSED state container found for destination
{}, trying next volume.{}}}{color} something like this.
was:
Currently DiskBalancer only supports CLOSED containers to be moved but if user
wants to also move quasi closed containers then we should support that as well.
*1st Improvement:*
Below scenario can happen in real, so Disk balancer can only attempt those
{*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A might
drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers
are the real bulk occupying the space and disk balancer is completely blind to
them.
{code:java}
Disk A: 90% utilized
├── 600 containers: QUASI_CLOSED
└── 10 containers: CLOSED
Disk B: 10% utilized
Disk C: 11% utilized{code}
We will have to re-write the logic in a way that if src and dest volumes are
available but not CLOSED containers then balancer will start moving
QUASI_CLOSED containers and when balancer finds all disks utilisation within
threshold it will stop automatically.
*2nd Improvement:*
We need to add debug logs for *chooseContainer* method as user might not
understand why if they have over and under utilised volume still container is
not moved. This parts needs more clarification. Because I see in escalation it
helped a lot with balancer debug logs for container not choose to identify what
state the contianer or volume was in.
I suggest adding these logs for container not choose :
{code:java}
// This debug will help us to understand what state container was because of
which it not moved
LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is
eligible)", containerId, src.getStorageID(), containerData.getState());
// It can happen a container size exceeds the usable space on that dn
LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} "
+ "usable space {}B", containerId, containerSize, src.getStorageID(),
dst.getStorageID(), usableSpace);
// this shows would push destination utilization over upper threshold
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " +
"result in utilization {} exceeding upper threshold {}", containerId,
containerSize, src.getStorageID(), dst.getStorageID(), newUtilization,
upperThreshold);{code}
For message like “{color:#de350b}{{No suitable container found for destination
{}, trying next volume."}}{color}. we can improve it to {color:#00875a}“{{{}No
suitable CLOSED state and QUASI_CLOSED state container found for destination
{}, trying next volume.{}}}{color} something like this.
> Allow QUASI_CLOSED containers in DiskBalancer with improved debug logging for
> containers
> ----------------------------------------------------------------------------------------
>
> Key: HDDS-14926
> URL: https://issues.apache.org/jira/browse/HDDS-14926
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Gargi Jaiswal
> Assignee: Gargi Jaiswal
> Priority: Major
>
> Currently DiskBalancer only supports CLOSED containers to be moved but if
> user wants to also move quasi closed containers then we should support that
> as well.
> *1st Improvement:*
> Below scenario can happen in real, so Disk balancer can only attempt those
> {*}10 CLOSED containers{*}. Even if it moves all 10 successfully, disk A
> might drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED
> containers are the real bulk occupying the space and disk balancer is
> completely blind to them.
> {code:java}
> Disk A: 90% utilized
> ├── 600 containers: QUASI_CLOSED
> └── 10 containers: CLOSED
> Disk B: 10% utilized
> Disk C: 11% utilized{code}
> Added a config `hdds.datanode.disk.balancer.include.non.standard.containers`
> **default=false**. If true, balancer include non-standard states, i.e,
> QUASI_CLOSED. So both CLOSED and QUASI_CLOSED state containers are eligible
> for move. If false (default), balancer only moves CLOSED containers.
>
> *2nd Improvement:*
> We need to add debug logs for *chooseContainer* method as user might not
> understand why if they have over and under utilised volume still container is
> not moved. This parts needs more clarification. Because I see in escalation
> it helped a lot with balancer debug logs for container not choose to identify
> what state the contianer or volume was in.
> I suggest adding these logs for container not choose :
>
> {code:java}
> // This debug will help us to understand what state container was because of
> which it not moved
> LOG.debug("Skipping container {} from volume {}: state is {} (only CLOSED is
> eligible)", containerId, src.getStorageID(), containerData.getState());
> // It can happen a container size exceeds the usable space on that dn
> LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {}
> " + "usable space {}B", containerId, containerSize, src.getStorageID(),
> dst.getStorageID(), usableSpace);
> // this shows would push destination utilization over upper threshold
> LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would " +
> "result in utilization {} exceeding upper threshold {}", containerId,
> containerSize, src.getStorageID(), dst.getStorageID(), newUtilization,
> upperThreshold);{code}
> For message like “{color:#de350b}{{No suitable container found for
> destination {}, trying next volume."}}{color}. we can improve it to
> {color:#00875a}“{{{}No suitable CLOSED state and QUASI_CLOSED state container
> found for destination {}, trying next volume.{}}}{color} something like this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]