Siddhant Sangwan created HDDS-14614:
---------------------------------------

             Summary: Improve handling for REPLICATION_NOT_HEALTHY_BEFORE_MOVE 
in Container Balancer
                 Key: HDDS-14614
                 URL: https://issues.apache.org/jira/browse/HDDS-14614
             Project: Apache Ozone
          Issue Type: Improvement
          Components: SCM
    Affects Versions: 2.1.0
            Reporter: Siddhant Sangwan


I've seen cases where a container passes the first replication/deletion check:
{code}
private boolean isContainerReplicatingOrDeleting(ContainerID containerID) {
return replicationManager.isContainerReplicatingOrDeleting(containerID);
}
{code}

but fails later in MoveManager, which does a second check for the same thing in 
a different way:
{code}
/*
If the container is under, over, or mis replicated, we should let
replication manager solve these issues first. Fail move for such a
container.
*/
ContainerHealthResult healthBeforeMove =
replicationManager.getContainerReplicationHealth(containerInfo,
currentReplicas);
if (healthBeforeMove.getHealthState() !=
ContainerHealthResult.HealthState.HEALTHY) {
ret.complete(MoveResult.REPLICATION_NOT_HEALTHY_BEFORE_MOVE);
return ret;
}
{code}

Since the MoveManager check is a stricter check, we should use the same check 
first in ContainerBalancerSelectionCriteria as well. And regardless, if move 
fails because of this reason, we still need to add the source datanode back to 
the priority queue so it can be considered for another move. See 
https://issues.apache.org/jira/browse/HDDS-7252.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to