[ 
https://issues.apache.org/jira/browse/HDDS-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976845#comment-16976845
 ] 

Stephen O'Donnell commented on HDDS-2459:
-----------------------------------------

With the current "pre decommission" replication logic, the definition of 
"isContainerHealthy" is like:

{code}
private boolean isContainerHealthy(final ContainerInfo container,
                                     final Set<ContainerReplica> replicas) {
    return container.getReplicationFactor().getNumber() == replicas.size() &&
        replicas.stream().allMatch(
            r -> compareState(container.getState(), r.getState()));
  }
{code}

On one hand, this check will always fail when a container has a maintenance or 
decommission replica, as it it will fail the second check as container.getState 
!= <state for all replicas>, but we may want to improve this check to also skip 
healthy decom and maintenance containers?

I wonder the check should:

1. Ensure there are no inflight operations
2. The container is sufficiently replicated based on the logic in the comment 
above.
3. The container is not over replicated.
4. Any non-maintenance and non-decommission replicas have the same state as the 
container itself

Alternatively, we can skip this check and run through the isOverReplicated() 
isUnderReplicated and isUnhealthy() scenarios, as we may well need to do about 
the same amount of work in the check as we do when checking over / under 
replicated or healthy later anyway.

> Refactor ReplicationManager to consider maintenance states
> ----------------------------------------------------------
>
>                 Key: HDDS-2459
>                 URL: https://issues.apache.org/jira/browse/HDDS-2459
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> In its current form the replication manager does not consider decommission or 
> maintenance states when checking if replicas are sufficiently replicated. 
> With the introduction of maintenance states, it needs to consider 
> decommission and maintenance states when deciding if blocks are over or under 
> replicated.
> It also needs to provide an API to allow the decommission manager to check if 
> blocks are over or under replicated, so the decommission manager can decide 
> if a node has completed decommission and maintenance or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to