[ 
https://issues.apache.org/jira/browse/HDDS-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddhant Sangwan updated HDDS-9321:
-----------------------------------
    Description: 
Mix of quasi-closed and unhealthy replicas blocks decommission even if 
sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:
{code}
if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
{code}

However, simply removing that check is not a complete solution. We need to try 
and preserve any UNHEALTHY replicas that have the greatest Sequence ID.

  was:
Mix of quasi-closed and unhealthy replicas blocks decommission even if 
sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:
{code}
if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
{code}


> LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated 
> container can block decommissioning
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-9321
>                 URL: https://issues.apache.org/jira/browse/HDDS-9321
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Siddhant Sangwan
>            Assignee: Siddhant Sangwan
>            Priority: Major
>
> Mix of quasi-closed and unhealthy replicas blocks decommission even if 
> sufficiently replicated.
> a. Caused when only some of the replicas hit the error during write.
> b. Can be fixed by removing this check:
> {code}
> if (!replicaSet.isHealthy()) {
>           if (LOG.isDebugEnabled()) {
>             unhealthyIDs.add(cid);
>           }
>           if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT
> {code}
> However, simply removing that check is not a complete solution. We need to 
> try and preserve any UNHEALTHY replicas that have the greatest Sequence ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to