[
https://issues.apache.org/jira/browse/HDDS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krishna Kumar Asawa reassigned HDDS-12680:
------------------------------------------
Assignee: Sarveksha Yeshavantha Raju
> Client hangs indefinitely on an UNHEALTHY container state
> ---------------------------------------------------------
>
> Key: HDDS-12680
> URL: https://issues.apache.org/jira/browse/HDDS-12680
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Soumitra Sulav
> Assignee: Sarveksha Yeshavantha Raju
> Priority: Major
>
> If a client encounters a container with a UNHEALTHY replica, it keeps trying
> indefinitely.
> {code:java}
> # ozone admin container info 24102
> Container id: 24102
> Pipeline id: 7a7842b2-4f70-439d-a9ce-a743be876465
> Container State: CLOSING
> Datanodes: [node1, node2, node3]
> Replicas: [State: UNHEALTHY; ReplicaIndex: 0; Origin:
> 926178e6-69a1-41de-97d1-a619d9c8cb8a; Location:
> 926178e6-69a1-41de-97d1-a619d9c8cb8a/node1,
> State: CLOSING; ReplicaIndex: 0; Origin:
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69; Location:
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69/node2,
> State: CLOSING; ReplicaIndex: 0; Origin:
> 0427b0f0-f9da-4aeb-9f4c-2f6887182085; Location:
> 0427b0f0-f9da-4aeb-9f4c-2f6887182085/node3]
> {code}
> {code:java}
> encodedToken: "VAoCb20SJmNvbk..3MTAzAAAA"
> version: 3
> , data.size=0
> java.util.concurrent.CompletionException:
> org.apache.ratis.protocol.exceptions.StateMachineException:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Put Key failed
> at
> org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:322)
> at
> org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:172)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:98)
> at
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:59)
> at
> org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
> at
> org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
> at
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:248)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:324)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:380)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:302)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:324)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:307)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:474)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:455)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:662)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.ratis.protocol.exceptions.StateMachineException:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Put Key failed
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$11(ContainerStateMachine.java:996)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:133)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> 25/03/24 09:53:56 WARN io.KeyOutputStream: Encountered exception
> java.io.IOException: Unexpected Storage Container Exception:
> java.util.concurrent.CompletionException:
> java.util.concurrent.CompletionException:
> org.apache.ratis.protocol.exceptions.StateMachineException:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Put Key failed on the pipeline Pipeline[ Id:
> 7a7842b2-4f70-439d-a9ce-a743be876465, Nodes:
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69(node1) ReplicaIndex:
> 00427b0f0-f9da-4aeb-9f4c-2f6887182085(node2) ReplicaIndex:
> 0926178e6-69a1-41de-97d1-a619d9c8cb8a(node3) ReplicaIndex: 0,
> ReplicationConfig: RATIS/THREE, State:OPEN,
> leaderId:926178e6-69a1-41de-97d1-a619d9c8cb8a,
> CreationTimestamp2025-03-24T09:53:46.176Z[Etc/UTC]]. The last committed block
> length is 0, uncommitted data length is 9613099 retry count 0
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]