[ 
https://issues.apache.org/jira/browse/HDDS-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-7640:
------------------------------------
    Description: 
Scenario :
In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for a 
container of a key which is in CLOSED state, 2 of the replicas in the cluster 
are made UNHEALTHY.
{code:java}
Replica Index 1: Closed
Replica Index 2: Unhealthy
Replica Index 3: Closed
Replica Index 4: Unhealthy
Replica Index 5: Closed{code}

Expected behaviour - The container should be identified as under replicated due 
to the unhealthy containers. Once is it fully replicated, the unhealthy 
containers should be removed.

Observed behaviour - The UNHEALTHY replicas are still in the container after 12 
hours and the under replication is not identified or fixed.

The reason is that the container is handled in 
ClosedWithMismatchedReplicasHandler as all replica states does not match the 
the container state (CLOSED with 2 UNHEALTHY). However we should not consider 
UNHEALTHY in ClosedWithMismatchedReplicasHandler, as its a special state. Its 
more intended for a CLOSED container with OPEN or CLOSING replicas.

  was:
Scenario :
In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for a 
container of a key which is in CLOSED state, 2 of the replicas in the cluster 
are made UNHEALTHY.
{code:java}
Replica Index 1: Closed
Replica Index 2: Unhealthy
Replica Index 3: Closed
Replica Index 4: Unhealthy
Replica Index 5: Closed{code}
Expected behaviour - The scrubber should remove the UNHEALTHY replicas and 
replace them with a new healthy replica.

Observed behaviour - The UNHEALTHY replicas are still in the container after 12 
hours.

Container info:
{code:java}
[root@vc0108 ~]# ozone admin container info 3001
Container id: 3001
Pipeline id: 5b6d07b9-fb25-4303-b784-c3578519374b
Container State: CLOSED
Datanodes: [cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec/vc0114.halxg.cloudera.com,
9f11d53f-a0c3-44bd-9506-60ec87a8ecad/vc0108.halxg.cloudera.com,
06d92566-3613-43d0-af92-70205c0f45a6/vc0112.halxg.cloudera.com,
f8efdcfb-e418-46a2-bd52-9baf78cbb1eb/vc0111.halxg.cloudera.com,
4623c856-03eb-479b-8f78-74401bb3f844/vc0117.halxg.cloudera.com]
Replicas: [State: CLOSED; ReplicaIndex: 1; Origin: 
cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec; Location: 
cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec/vc0114.halxg.cloudera.com,
State: UNHEALTHY; ReplicaIndex: 2; Origin: 
06d92566-3613-43d0-af92-70205c0f45a6; Location: 
06d92566-3613-43d0-af92-70205c0f45a6/vc0112.halxg.cloudera.com,
State: CLOSED; ReplicaIndex: 3; Origin: 9f11d53f-a0c3-44bd-9506-60ec87a8ecad; 
Location: 9f11d53f-a0c3-44bd-9506-60ec87a8ecad/vc0108.halxg.cloudera.com,
State: UNHEALTHY; ReplicaIndex: 4; Origin: 
4623c856-03eb-479b-8f78-74401bb3f844; Location: 
4623c856-03eb-479b-8f78-74401bb3f844/vc0117.halxg.cloudera.com,
State: CLOSED; ReplicaIndex: 5; Origin: f8efdcfb-e418-46a2-bd52-9baf78cbb1eb; 
Location: f8efdcfb-e418-46a2-bd52-9baf78cbb1eb/vc0111.halxg.cloudera.com]{code}


> EC: UNHEALTHY replicas not replaced by healthy replicas from a CLOSED 
> container by RM
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-7640
>                 URL: https://issues.apache.org/jira/browse/HDDS-7640
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: ECOfflineRecovery, SCM
>    Affects Versions: 1.3.0
>            Reporter: Jyotirmoy Sinha
>            Assignee: Siddhant Sangwan
>            Priority: Major
>
> Scenario :
> In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for 
> a container of a key which is in CLOSED state, 2 of the replicas in the 
> cluster are made UNHEALTHY.
> {code:java}
> Replica Index 1: Closed
> Replica Index 2: Unhealthy
> Replica Index 3: Closed
> Replica Index 4: Unhealthy
> Replica Index 5: Closed{code}
> Expected behaviour - The container should be identified as under replicated 
> due to the unhealthy containers. Once is it fully replicated, the unhealthy 
> containers should be removed.
> Observed behaviour - The UNHEALTHY replicas are still in the container after 
> 12 hours and the under replication is not identified or fixed.
> The reason is that the container is handled in 
> ClosedWithMismatchedReplicasHandler as all replica states does not match the 
> the container state (CLOSED with 2 UNHEALTHY). However we should not consider 
> UNHEALTHY in ClosedWithMismatchedReplicasHandler, as its a special state. Its 
> more intended for a CLOSED container with OPEN or CLOSING replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to