[ 
https://issues.apache.org/jira/browse/HDDS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar Asawa reassigned HDDS-12680:
------------------------------------------

    Assignee: Sarveksha Yeshavantha Raju

> Client hangs indefinitely on an UNHEALTHY container state
> ---------------------------------------------------------
>
>                 Key: HDDS-12680
>                 URL: https://issues.apache.org/jira/browse/HDDS-12680
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Soumitra Sulav
>            Assignee: Sarveksha Yeshavantha Raju
>            Priority: Major
>
> If a client encounters a container with a UNHEALTHY replica, it keeps trying 
> indefinitely.
> {code:java}
> # ozone admin container info 24102
> Container id: 24102
> Pipeline id: 7a7842b2-4f70-439d-a9ce-a743be876465
> Container State: CLOSING
> Datanodes: [node1, node2, node3]
> Replicas: [State: UNHEALTHY; ReplicaIndex: 0; Origin: 
> 926178e6-69a1-41de-97d1-a619d9c8cb8a; Location: 
> 926178e6-69a1-41de-97d1-a619d9c8cb8a/node1,
> State: CLOSING; ReplicaIndex: 0; Origin: 
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69; Location: 
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69/node2,
> State: CLOSING; ReplicaIndex: 0; Origin: 
> 0427b0f0-f9da-4aeb-9f4c-2f6887182085; Location: 
> 0427b0f0-f9da-4aeb-9f4c-2f6887182085/node3]
> {code}
> {code:java}
> encodedToken: "VAoCb20SJmNvbk..3MTAzAAAA"
> version: 3
> , data.size=0
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.exceptions.StateMachineException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Put Key failed
>       at 
> org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:322)
>       at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:172)
>       at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>       at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>       at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>       at 
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:98)
>       at 
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:59)
>       at 
> org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
>       at 
> org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
>       at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:248)
>       at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>       at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>       at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>       at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:324)
>       at java.util.Optional.ifPresent(Optional.java:159)
>       at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:380)
>       at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:302)
>       at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:324)
>       at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:307)
>       at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:474)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:455)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:662)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Put Key failed
>       at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$11(ContainerStateMachine.java:996)
>       at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>       at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>       at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>       at org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:133)
>       at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
>       at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       ... 3 more
> 25/03/24 09:53:56 WARN io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.exceptions.StateMachineException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Put Key failed on the pipeline Pipeline[ Id: 
> 7a7842b2-4f70-439d-a9ce-a743be876465, Nodes: 
> e751fa12-8be2-4ee4-9655-16ef7d8b1a69(node1) ReplicaIndex: 
> 00427b0f0-f9da-4aeb-9f4c-2f6887182085(node2) ReplicaIndex: 
> 0926178e6-69a1-41de-97d1-a619d9c8cb8a(node3) ReplicaIndex: 0, 
> ReplicationConfig: RATIS/THREE, State:OPEN, 
> leaderId:926178e6-69a1-41de-97d1-a619d9c8cb8a, 
> CreationTimestamp2025-03-24T09:53:46.176Z[Etc/UTC]]. The last committed block 
> length is 0, uncommitted data length is 9613099 retry count 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to