[
https://issues.apache.org/jira/browse/HDDS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddhant Sangwan updated HDDS-9258:
-----------------------------------
Description:
In RatisContainerReplicaCount, should we discount any pending deletes for
replicas that LRM sees as unhealthy? Since we ignore UNHEALTHY containers, it
makes sense to not count their pending deletes.
Suppose there's a CLOSED container with replicas:
CLOSED, CLOSED, CLOSED, UNHEALTHY (not counted, seen as excess that can be
deleted).
In the current iteration, RM sends a delete command for the unhealthy, so now
there's a pending delete. In the next iteration, if the delete is still
pending, then RM will see 3 CLOSED replicas - 1 pending delete + 1 UNHEALTHY
replica. But UNHEALTHY replicas are ignored, that's effectively 3 CLOSED
replicas - 1 pending delete (even though the delete is for the UNHEALTHY). This
means the effective count becomes 2, which is seen as under replicated. Of
course, this container is not actually under replicated. We need to verify if
it's actually a bug - I have not written any tests to reproduce this yet.
was:In RatisContainerReplicaCount, should we discount any pending deletes for
replicas that LRM sees as unhealthy?
> LegacyReplicationManager: Pending deletes on unhealthy replicas can cause
> calculation errors
> --------------------------------------------------------------------------------------------
>
> Key: HDDS-9258
> URL: https://issues.apache.org/jira/browse/HDDS-9258
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Siddhant Sangwan
> Priority: Major
>
> In RatisContainerReplicaCount, should we discount any pending deletes for
> replicas that LRM sees as unhealthy? Since we ignore UNHEALTHY containers, it
> makes sense to not count their pending deletes.
> Suppose there's a CLOSED container with replicas:
> CLOSED, CLOSED, CLOSED, UNHEALTHY (not counted, seen as excess that can be
> deleted).
> In the current iteration, RM sends a delete command for the unhealthy, so now
> there's a pending delete. In the next iteration, if the delete is still
> pending, then RM will see 3 CLOSED replicas - 1 pending delete + 1 UNHEALTHY
> replica. But UNHEALTHY replicas are ignored, that's effectively 3 CLOSED
> replicas - 1 pending delete (even though the delete is for the UNHEALTHY).
> This means the effective count becomes 2, which is seen as under replicated.
> Of course, this container is not actually under replicated. We need to verify
> if it's actually a bug - I have not written any tests to reproduce this yet.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]