Pratyush Bhatt created HDDS-10740:
-------------------------------------
Summary: [Hbase-Ozone] HMaster down due to
"ContainerNotOpenException: Container in CLOSED state"
Key: HDDS-10740
URL: https://issues.apache.org/jira/browse/HDDS-10740
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Datanode, SCM
Reporter: Pratyush Bhatt
HMaster abruptly crashes down, checked the logs, just before the crash logs
like this are there:
{code:java}
java.util.concurrent.CompletionException:
org.apache.ratis.protocol.exceptions.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container
2061 in CLOSED state {code}
Full related log:
{code:java}
2024-04-22 08:05:26,709 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed
to send request, message=cmdType: WriteChunk
traceID: ""
containerID: 2061
datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
writeChunk {
blockID {
containerID: 2061
localID: 113750153625619822
blockCommitSequenceId: 18190201
replicaIndex: 0
}
chunkData {
chunkName: "113750153625619822_chunk_2699"
offset: 1497659
len: 98
checksumData {
type: CRC32
bytesPerChecksum: 16384
checksums: "U\321\246\212"
}
}
}
encodedToken:
"VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
, data.size=98
java.util.concurrent.CompletionException:
org.apache.ratis.protocol.exceptions.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container
2061 in CLOSED state
at
org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
at
org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
at
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
at
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
at
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.exceptions.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container
2061 in CLOSED state
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
at
org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
at
org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
at
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 3 more
Caused by:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException:
Container 2061 in CLOSED state
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
at
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
at
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
at
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
... 3 more
2024-04-22 08:05:26,832 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed
to send request, message=cmdType: PutBlock
traceID: ""
containerID: 2061
datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
putBlock {
blockData {
blockID {
containerID: 2061
localID: 113750153625619822
blockCommitSequenceId: 0
}
metadata {
key: "TYPE"
value: "KEY"
}
metadata {
key: "incremental"
}
chunks {
chunkName: "113750153625619822_chunk_0"
offset: 0
len: 1497757
checksumData {
type: CRC32
bytesPerChecksum: 16384
checksums: ".M]\274"
checksums: "\341f@\350"
checksums: "3\215@\243"
checksums: "\027\220|\226"
checksums: "xE8B"
checksums: ",\300\263\233"
checksums: "#\314\246x"
checksums: "\313\220\211\362"
checksums: "\337P6\004"
checksums: "\351\334(\032"
checksums: "l\315\005["
checksums: "P\311\212\245"
checksums: "\355R\361\235"
checksums: "\256\341\206\304"
checksums: "x\304\353\322"
checksums: "q\257\337\027"
checksums: "e\253\304\241"
checksums: "Fy`6"
checksums: "\351A\221\351"
checksums: "\270\243\366T"
checksums: "\246\264aN"
checksums: "V`\033\003"
checksums: " $F\214"
.
.
.
checksums: "D5\360\350"
checksums: "\360w\314X"
checksums: "\350\025\003\263"
checksums: "\347\310\334\215"
}
}
}
eof: false
}
encodedToken:
"VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
, data.size=0
java.util.concurrent.CompletionException:
org.apache.ratis.protocol.exceptions.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container
2061 in CLOSED state
at
org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
at
org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
at
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
at
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
at
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
at
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.exceptions.StateMachineException:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException
from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container
2061 in CLOSED state
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
at
org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
at
org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
at
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 3 more
Caused by:
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException:
Container 2061 in CLOSED state
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
at
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
at
org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
at
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
at
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
... 3 more
2024-04-22 08:05:30,039 WARN org.apache.ratis.grpc.GrpcUtil: Timed out
gracefully shutting down connection:
ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=191,
target=10.140.52.141:9858}}. {code}
And then the Master goes down:
{code:java}
2024-04-22 08:08:02,026 ERROR org.apache.hadoop.hbase.master.HMaster: *****
ABORTING master ccycloud-7.ozn-hb973chf3oz.xyz,22001,1713770648404: Log rolling
failed *****
java.lang.RuntimeException
at
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALMetadata(AsyncProtobufLogWriter.java:217)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(AsyncProtobufLogWriter.java:223)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:164)
at
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
2024-04-22 08:08:02,034 INFO org.apache.ranger.plugin.util.PolicyRefresher:
PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208)
2024-04-22 08:08:02,037 INFO
org.apache.ranger.audit.provider.AuditProviderFactory: ==>
JVMShutdownHook.run() {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]