Sergey Soldatov created HDDS-14674:
--------------------------------------

             Summary: Node with existing QUASI_CLOSED replica can be wrongly 
selected as replication target
                 Key: HDDS-14674
                 URL: https://issues.apache.org/jira/browse/HDDS-14674
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM
    Affects Versions: 2.1.0
            Reporter: Sergey Soldatov
            Assignee: Sergey Soldatov


During RATIS under-replication handling (vulnerable/unhealthy path), SCM can 
lose visibility of some existing replicas before target selection. As a result, 
a DN that already has a replica of the same container may be incorrectly 
considered eligible as a new target.

Why does it happen:

In RatisUnderReplicationHandler.processAndSendCommands(...) we create 2 
counters:
 withUnhealthy = new RatisContainerReplicaCount(containerInfo, replicas, 
pendingOps, ..., true)
withoutUnhealthy = new RatisContainerReplicaCount(containerInfo, replicas, 
pendingOps, ..., false)

if we have vulnerable/unhealthy replicas we call
{*}handleVulnerableUnhealthyReplicas{*}(withUnhealthy, pendingOps) 

Inside  we calls withUnhealthy.{*}getVulnerableUnhealthyReplicas{*}(...) that 
mutates the internal field *replicas* via replicas.removeIf(...)

   So *withUnhealthy* object now has a modified internal replica list.

After that, we call 

replicateEachSource({*}withUnhealthy{*}, vulnerableUnhealthy, pendingOps)

where we do the following:
     *allReplicas* = {*}withUnhealthy{*}.getReplicas()
     ReplicationManagerUtil.getExcludedAndUsedNodes(container, 
{*}allReplicas{*}, ...)

As a result, some existing replica hosts (non-healthy/stale ones) may be 
missing from placement inputs. This can allow a DN that already hosts a replica 
to be considered as a replication target.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to