[ 
https://issues.apache.org/jira/browse/HDDS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddhant Sangwan updated HDDS-9258:
-----------------------------------
    Description: 
In RatisContainerReplicaCount, should we discount any pending deletes for 
replicas that LRM sees as unhealthy? Since we ignore UNHEALTHY containers, it 
makes sense to not count their pending deletes.

Suppose there's a  CLOSED container with replicas:
CLOSED, CLOSED, CLOSED, UNHEALTHY (not counted, seen as excess that can be 
deleted).

In the current iteration, RM sends a delete command for the unhealthy, so now 
there's a pending delete. In the next iteration, if the delete is still 
pending, then RM will see 3 CLOSED replicas - 1 pending delete + 1 UNHEALTHY 
replica. But UNHEALTHY replicas are ignored, that's effectively 3 CLOSED 
replicas - 1 pending delete (even though the delete is for the UNHEALTHY). This 
means the effective count becomes 2, which is seen as under replicated. Of 
course, this container is not actually under replicated. We need to verify if 
it's actually a bug - I have not written any tests to reproduce this yet.

  was:In RatisContainerReplicaCount, should we discount any pending deletes for 
replicas that LRM sees as unhealthy?


> LegacyReplicationManager: Pending deletes on unhealthy replicas can cause 
> calculation errors
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-9258
>                 URL: https://issues.apache.org/jira/browse/HDDS-9258
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Siddhant Sangwan
>            Priority: Major
>
> In RatisContainerReplicaCount, should we discount any pending deletes for 
> replicas that LRM sees as unhealthy? Since we ignore UNHEALTHY containers, it 
> makes sense to not count their pending deletes.
> Suppose there's a  CLOSED container with replicas:
> CLOSED, CLOSED, CLOSED, UNHEALTHY (not counted, seen as excess that can be 
> deleted).
> In the current iteration, RM sends a delete command for the unhealthy, so now 
> there's a pending delete. In the next iteration, if the delete is still 
> pending, then RM will see 3 CLOSED replicas - 1 pending delete + 1 UNHEALTHY 
> replica. But UNHEALTHY replicas are ignored, that's effectively 3 CLOSED 
> replicas - 1 pending delete (even though the delete is for the UNHEALTHY). 
> This means the effective count becomes 2, which is seen as under replicated. 
> Of course, this container is not actually under replicated. We need to verify 
> if it's actually a bug - I have not written any tests to reproduce this yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to