Pratyush Bhatt created HDDS-10740:
-------------------------------------

             Summary: [Hbase-Ozone] HMaster down due to 
"ContainerNotOpenException: Container in CLOSED state"
                 Key: HDDS-10740
                 URL: https://issues.apache.org/jira/browse/HDDS-10740
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode, SCM
            Reporter: Pratyush Bhatt


HMaster abruptly crashes down, checked the logs, just before the crash logs 
like this are there:
{code:java}
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 
2061 in CLOSED state {code}
Full related log:
{code:java}
2024-04-22 08:05:26,709 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed 
to send request, message=cmdType: WriteChunk
traceID: ""
containerID: 2061
datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
writeChunk {
  blockID {
    containerID: 2061
    localID: 113750153625619822
    blockCommitSequenceId: 18190201
    replicaIndex: 0
  }
  chunkData {
    chunkName: "113750153625619822_chunk_2699"
    offset: 1497659
    len: 98
    checksumData {
      type: CRC32
      bytesPerChecksum: 16384
      checksums: "U\321\246\212"
    }
  }
}
encodedToken: 
"VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
, data.size=98
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 
2061 in CLOSED state
        at 
org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
        at 
org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
        at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
        at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
        at 
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
        at 
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
        at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
        at java.util.Optional.ifPresent(Optional.java:159)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 
2061 in CLOSED state
        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
        at 
org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
        at 
org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
        at 
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
        at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        ... 3 more
Caused by: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: 
Container 2061 in CLOSED state
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        ... 3 more
2024-04-22 08:05:26,832 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed 
to send request, message=cmdType: PutBlock
traceID: ""
containerID: 2061
datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
putBlock {
  blockData {
    blockID {
      containerID: 2061
      localID: 113750153625619822
      blockCommitSequenceId: 0
    }
    metadata {
      key: "TYPE"
      value: "KEY"
    }
    metadata {
      key: "incremental"
    }
    chunks {
      chunkName: "113750153625619822_chunk_0"
      offset: 0
      len: 1497757
      checksumData {
        type: CRC32
        bytesPerChecksum: 16384
        checksums: ".M]\274"
        checksums: "\341f@\350"
        checksums: "3\215@\243"
        checksums: "\027\220|\226"
        checksums: "xE8B"
        checksums: ",\300\263\233"
        checksums: "#\314\246x"
        checksums: "\313\220\211\362"
        checksums: "\337P6\004"
        checksums: "\351\334(\032"
        checksums: "l\315\005["
        checksums: "P\311\212\245"
        checksums: "\355R\361\235"
        checksums: "\256\341\206\304"
        checksums: "x\304\353\322"
        checksums: "q\257\337\027"
        checksums: "e\253\304\241"
        checksums: "Fy`6"
        checksums: "\351A\221\351"
        checksums: "\270\243\366T"
        checksums: "\246\264aN"
        checksums: "V`\033\003"
        checksums: " $F\214"
.
.
.
        checksums: "D5\360\350"
        checksums: "\360w\314X"
        checksums: "\350\025\003\263"
        checksums: "\347\310\334\215"
      }
    }
  }
  eof: false
}
encodedToken: 
"VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
, data.size=0
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 
2061 in CLOSED state
        at 
org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
        at 
org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
        at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
        at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
        at 
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
        at 
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
        at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
        at java.util.Optional.ifPresent(Optional.java:159)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 
2061 in CLOSED state
        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
        at 
org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
        at 
org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
        at 
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
        at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        ... 3 more
Caused by: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: 
Container 2061 in CLOSED state
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
        at 
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
        at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        ... 3 more
2024-04-22 08:05:30,039 WARN org.apache.ratis.grpc.GrpcUtil: Timed out 
gracefully shutting down connection: 
ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=191, 
target=10.140.52.141:9858}}. {code}
And then the Master goes down:
{code:java}
2024-04-22 08:08:02,026 ERROR org.apache.hadoop.hbase.master.HMaster: ***** 
ABORTING master ccycloud-7.ozn-hb973chf3oz.xyz,22001,1713770648404: Log rolling 
failed *****
java.lang.RuntimeException
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALMetadata(AsyncProtobufLogWriter.java:217)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(AsyncProtobufLogWriter.java:223)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:164)
        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
2024-04-22 08:08:02,034 INFO org.apache.ranger.plugin.util.PolicyRefresher: 
PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208)
2024-04-22 08:08:02,037 INFO 
org.apache.ranger.audit.provider.AuditProviderFactory: ==> 
JVMShutdownHook.run() {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to