[
https://issues.apache.org/jira/browse/HDDS-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698020#comment-17698020
]
Mladjan Gadzic commented on HDDS-7098:
--------------------------------------
[~erose] It looks like Recon API shows that. I followed [~NeilJoshi]
instructions to verify this. It was done on unsecured Ozone with 3 DNs.
{code:java}
bash-4.2$ ozone admin container info 1
Container id: 1
Pipeline id: 0edb486b-4bdc-478b-bc73-e4dc69e422e5
Container State: OPEN
Datanodes: [d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default,
4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default,
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default]
Replicas: [State: OPEN; ReplicaIndex: 0; Origin:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed; Location:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default,
State: OPEN; ReplicaIndex: 0; Origin: 4bac7a54-31ec-4795-b904-4ba5ff3f7343;
Location: 4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default,
State: OPEN; ReplicaIndex: 0; Origin: d2ee3d1f-10f9-452d-b692-576edc6b6096;
Location: d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default]
bash-4.2$ ozone admin container close 1
bash-4.2$ ozone admin container info 1
Container id: 1
Pipeline id: 0edb486b-4bdc-478b-bc73-e4dc69e422e5
Container State: CLOSING
Datanodes: [d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default,
4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default,
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default]
Replicas: [State: OPEN; ReplicaIndex: 0; Origin:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed; Location:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default,
State: OPEN; ReplicaIndex: 0; Origin: d2ee3d1f-10f9-452d-b692-576edc6b6096;
Location: d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default,
State: OPEN; ReplicaIndex: 0; Origin: 4bac7a54-31ec-4795-b904-4ba5ff3f7343;
Location: 4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default]
bash-4.2$ ozone admin container info 1
Container id: 1
Pipeline id: e203da07-5c9e-44c3-96d7-31d960dbb4c1
Container State: CLOSED
Datanodes: [4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default,
d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default,
4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default]
Replicas: [State: CLOSED; ReplicaIndex: 0; Origin:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed; Location:
4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed/ozone-datanode-2.ozone_default,
State: CLOSED; ReplicaIndex: 0; Origin: d2ee3d1f-10f9-452d-b692-576edc6b6096;
Location: d2ee3d1f-10f9-452d-b692-576edc6b6096/ozone-datanode-3.ozone_default,
State: CLOSED; ReplicaIndex: 0; Origin: 4bac7a54-31ec-4795-b904-4ba5ff3f7343;
Location: 4bac7a54-31ec-4795-b904-4ba5ff3f7343/ozone-datanode-1.ozone_default]
{code}
Recon API [http://localhost:9888/api/v1/containers/unhealthy] response was:
{code:java}
{"missingCount": 1,"underReplicatedCount": 0,"overReplicatedCount":
0,"misReplicatedCount": 0,"containers": []}{code}
After removal of Docker DN containers Recon API response for the same endpoint
was:
{code:java}
{"missingCount": 1,"underReplicatedCount": 0,"overReplicatedCount":
0,"misReplicatedCount": 0,"containers": [{"containerID": 1,"containerState":
"MISSING","unhealthySince": 1678295590336,"expectedReplicaCount":
3,"actualReplicaCount": 0,"replicaDeltaCount": 3,"reason": null,"keys":
1000,"pipelineID": "0edb486b-4bdc-478b-bc73-e4dc69e422e5","replicas":
[{"containerId": 1,"datanodeUuid":
"4ebf5f8e-7ea7-409e-a4eb-c1d0ef6c57ed","datanodeHost":
"ozone-datanode-2.ozone_default","firstSeenTime": 1678295188710,"lastSeenTime":
1678295529741,"lastBcsId": 0},{"containerId": 1,"datanodeUuid":
"4bac7a54-31ec-4795-b904-4ba5ff3f7343","datanodeHost":
"ozone-datanode-1.ozone_default","firstSeenTime": 1678295188687,"lastSeenTime":
1678295464727,"lastBcsId": 0},{"containerId": 1,"datanodeUuid":
"d2ee3d1f-10f9-452d-b692-576edc6b6096","datanodeHost":
"ozone-datanode-3.ozone_default","firstSeenTime": 1678295188710,"lastSeenTime":
1678295464721,"lastBcsId": 0}]}]}{code}
> Provide a way for admin to identify all unhealthy container replicas
> --------------------------------------------------------------------
>
> Key: HDDS-7098
> URL: https://issues.apache.org/jira/browse/HDDS-7098
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Devesh Kumar Singh
> Priority: Major
> Attachments: MissingContainers.png, image-2023-03-02-16-01-07-814.png
>
>
> Currently UNHEALTHY is a state that a container replica can be in
> (ContainerReplicaProto#State), but not a state that the container can be in
> overall (LifeCycleState). This means {{ozone admin container list}} has no
> info about unhealthy containers, because it currently does not print replica
> information. [Recon's
> API|https://ozone.apache.org/docs/current/interface/reconapi.html] and UI
> does not expose replica information either. The only way to determine
> unhealthy containers is to run {{ozone admin container info <ID>}} for a
> container that is already suspected to have unhealthy replicas. This jira
> aims to provide a way to identify and filter container replica states,
> through either Recon's UI, Recon's REST API, or client CLI.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]