Siddhant Sangwan created HDDS-14614:
---------------------------------------
Summary: Improve handling for REPLICATION_NOT_HEALTHY_BEFORE_MOVE
in Container Balancer
Key: HDDS-14614
URL: https://issues.apache.org/jira/browse/HDDS-14614
Project: Apache Ozone
Issue Type: Improvement
Components: SCM
Affects Versions: 2.1.0
Reporter: Siddhant Sangwan
I've seen cases where a container passes the first replication/deletion check:
{code}
private boolean isContainerReplicatingOrDeleting(ContainerID containerID) {
return replicationManager.isContainerReplicatingOrDeleting(containerID);
}
{code}
but fails later in MoveManager, which does a second check for the same thing in
a different way:
{code}
/*
If the container is under, over, or mis replicated, we should let
replication manager solve these issues first. Fail move for such a
container.
*/
ContainerHealthResult healthBeforeMove =
replicationManager.getContainerReplicationHealth(containerInfo,
currentReplicas);
if (healthBeforeMove.getHealthState() !=
ContainerHealthResult.HealthState.HEALTHY) {
ret.complete(MoveResult.REPLICATION_NOT_HEALTHY_BEFORE_MOVE);
return ret;
}
{code}
Since the MoveManager check is a stricter check, we should use the same check
first in ContainerBalancerSelectionCriteria as well. And regardless, if move
fails because of this reason, we still need to add the source datanode back to
the priority queue so it can be considered for another move. See
https://issues.apache.org/jira/browse/HDDS-7252.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]