[
https://issues.apache.org/jira/browse/HDDS-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-7640:
------------------------------------
Description:
Scenario :
In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for a
container of a key which is in CLOSED state, 2 of the replicas in the cluster
are made UNHEALTHY.
{code:java}
Replica Index 1: Closed
Replica Index 2: Unhealthy
Replica Index 3: Closed
Replica Index 4: Unhealthy
Replica Index 5: Closed{code}
Expected behaviour - The container should be identified as under replicated due
to the unhealthy containers. Once is it fully replicated, the unhealthy
containers should be removed.
Observed behaviour - The UNHEALTHY replicas are still in the container after 12
hours and the under replication is not identified or fixed.
The reason is that the container is handled in
ClosedWithMismatchedReplicasHandler as all replica states does not match the
the container state (CLOSED with 2 UNHEALTHY). However we should not consider
UNHEALTHY in ClosedWithMismatchedReplicasHandler, as its a special state. Its
more intended for a CLOSED container with OPEN or CLOSING replicas.
was:
Scenario :
In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for a
container of a key which is in CLOSED state, 2 of the replicas in the cluster
are made UNHEALTHY.
{code:java}
Replica Index 1: Closed
Replica Index 2: Unhealthy
Replica Index 3: Closed
Replica Index 4: Unhealthy
Replica Index 5: Closed{code}
Expected behaviour - The scrubber should remove the UNHEALTHY replicas and
replace them with a new healthy replica.
Observed behaviour - The UNHEALTHY replicas are still in the container after 12
hours.
Container info:
{code:java}
[root@vc0108 ~]# ozone admin container info 3001
Container id: 3001
Pipeline id: 5b6d07b9-fb25-4303-b784-c3578519374b
Container State: CLOSED
Datanodes: [cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec/vc0114.halxg.cloudera.com,
9f11d53f-a0c3-44bd-9506-60ec87a8ecad/vc0108.halxg.cloudera.com,
06d92566-3613-43d0-af92-70205c0f45a6/vc0112.halxg.cloudera.com,
f8efdcfb-e418-46a2-bd52-9baf78cbb1eb/vc0111.halxg.cloudera.com,
4623c856-03eb-479b-8f78-74401bb3f844/vc0117.halxg.cloudera.com]
Replicas: [State: CLOSED; ReplicaIndex: 1; Origin:
cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec; Location:
cd68d7a3-8b52-4ab8-a12c-6871fda5a0ec/vc0114.halxg.cloudera.com,
State: UNHEALTHY; ReplicaIndex: 2; Origin:
06d92566-3613-43d0-af92-70205c0f45a6; Location:
06d92566-3613-43d0-af92-70205c0f45a6/vc0112.halxg.cloudera.com,
State: CLOSED; ReplicaIndex: 3; Origin: 9f11d53f-a0c3-44bd-9506-60ec87a8ecad;
Location: 9f11d53f-a0c3-44bd-9506-60ec87a8ecad/vc0108.halxg.cloudera.com,
State: UNHEALTHY; ReplicaIndex: 4; Origin:
4623c856-03eb-479b-8f78-74401bb3f844; Location:
4623c856-03eb-479b-8f78-74401bb3f844/vc0117.halxg.cloudera.com,
State: CLOSED; ReplicaIndex: 5; Origin: f8efdcfb-e418-46a2-bd52-9baf78cbb1eb;
Location: f8efdcfb-e418-46a2-bd52-9baf78cbb1eb/vc0111.halxg.cloudera.com]{code}
> EC: UNHEALTHY replicas not replaced by healthy replicas from a CLOSED
> container by RM
> -------------------------------------------------------------------------------------
>
> Key: HDDS-7640
> URL: https://issues.apache.org/jira/browse/HDDS-7640
> Project: Apache Ozone
> Issue Type: Bug
> Components: ECOfflineRecovery, SCM
> Affects Versions: 1.3.0
> Reporter: Jyotirmoy Sinha
> Assignee: Siddhant Sangwan
> Priority: Major
>
> Scenario :
> In a cluster(13 datanodes) having bucket of EC replication RS-3-2-1024K, for
> a container of a key which is in CLOSED state, 2 of the replicas in the
> cluster are made UNHEALTHY.
> {code:java}
> Replica Index 1: Closed
> Replica Index 2: Unhealthy
> Replica Index 3: Closed
> Replica Index 4: Unhealthy
> Replica Index 5: Closed{code}
> Expected behaviour - The container should be identified as under replicated
> due to the unhealthy containers. Once is it fully replicated, the unhealthy
> containers should be removed.
> Observed behaviour - The UNHEALTHY replicas are still in the container after
> 12 hours and the under replication is not identified or fixed.
> The reason is that the container is handled in
> ClosedWithMismatchedReplicasHandler as all replica states does not match the
> the container state (CLOSED with 2 UNHEALTHY). However we should not consider
> UNHEALTHY in ClosedWithMismatchedReplicasHandler, as its a special state. Its
> more intended for a CLOSED container with OPEN or CLOSING replicas.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]