[
https://issues.apache.org/jira/browse/HDDS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629825#comment-17629825
]
Jyotirmoy Sinha commented on HDDS-7374:
---------------------------------------
[~hemantk] , Please find steps below to reproduce :
# Build Ozone project from HDDS-6517-Snapshot branch, and compose docker with
3 datanodes.
{code:java}
jsinha-terminal ~ % docker ps
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS
NAMES
930dc41ce4fa apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 24 seconds 0.0.0.0:9860->9860/tcp, 0.0.0.0:9876->9876/tcp
ozone_scm_1
f60016bb8d58 apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 12 seconds 0.0.0.0:9878->9878/tcp
ozone_s3g_1
a0217284fee3 apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 21 seconds 0.0.0.0:59822->9864/tcp, 0.0.0.0:59821->9882/tcp
ozone_datanode_3
fac16315fc57 apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 20 seconds 0.0.0.0:59823->9864/tcp, 0.0.0.0:59824->9882/tcp
ozone_datanode_1
6becc941cbb6 apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 19 seconds 0.0.0.0:9888->9888/tcp
ozone_recon_1
88fec0e0cf3b apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 10 seconds 0.0.0.0:9862->9862/tcp, 0.0.0.0:9874->9874/tcp
ozone_om_1
b46057be7cf7 apache/ozone-runner:20220623-1 "/usr/local/bin/dumb…" 7 days
ago Up 16 seconds 0.0.0.0:59825->9864/tcp, 0.0.0.0:59826->9882/tcp
ozone_datanode_2 {code}
# Login to om docker, run create volume, bucket, key and then create snapshot
of the same.
# Leave the system idle for 1-2 hours, the om docker process exits along with
the scm service.
OM docker logs -
{code:java}
2022-11-07 10:40:38,256 [OM StateMachine ApplyTransaction Thread - 0] INFO
volume.OMVolumeCreateRequest: created volume:vol1 for user:hadoop
2022-11-07 10:40:50,179 [OM StateMachine ApplyTransaction Thread - 0] INFO
bucket.OMBucketCreateRequest: created bucket: buck1 of layout LEGACY in volume:
vol1
2022-11-07 10:41:30,737 [OM StateMachine ApplyTransaction Thread - 0] INFO
snapshot.OMSnapshotCreateRequest: created snapshot: name snap1 in snapshotPath:
vol1/buck1
2022-11-07 10:41:30,847 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager: Created checkpoint at
/data/metadata/db.snapshots/om.db-84b6bfc3-91e6-4723-a81f-6ad1bf891676 in 103
milliseconds
2022-11-07 10:41:30,851 [OMDoubleBufferFlushThread] WARN
rocksdiff.RocksDBCheckpointDiffer: Compaction log exists:
/data/metadata/compaction-log/0000000000000000000.log. Will append
2022-11-07 10:41:45,529 [IPC Server handler 24 on default port 9862] WARN
db.RDBStore: Unable to get delta updates since sequenceNumber 5. This exception
will be thrown to the client
org.apache.hadoop.hdds.utils.db.SequenceNumberNotFoundException: Unable to read
data from RocksDB wal to get delta updates. It may have already been flushed to
SSTs.
at
org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:383)
at
org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3810)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:321)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:223)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:226)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:175)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147)
at
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:466)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:552)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)
2022-11-07 10:41:45,600 [qtp961708482-52] INFO utils.DBCheckpointServlet:
Received request to obtain DB checkpoint snapshot
2022-11-07 10:41:45,620 [qtp961708482-52] INFO db.RDBCheckpointManager: Created
checkpoint at /data/metadata/db.checkpoints/om.db_checkpoint_1667817705601 in
18 milliseconds
2022-11-07 10:41:45,650 [qtp961708482-52] INFO utils.DBCheckpointServlet: Time
taken to write the checkpoint to response output stream: 29 milliseconds
2022-11-07 10:41:45,651 [qtp961708482-52] INFO db.RocksDBCheckpoint: Cleaning
up RocksDB checkpoint at
/data/metadata/db.checkpoints/om.db_checkpoint_1667817705601
2022-11-07 10:42:03,945 [IPC Server handler 12 on default port 9862] WARN
server.ServerUtils: ozone.om.db.dirs is not configured. We recommend adding
this setting. Falling back to ozone.metadata.dirs instead.
2022-11-07 10:42:03,949 [IPC Server handler 12 on default port 9862] INFO
codec.OmKeyInfoCodec: OmKeyInfoCodec ignorePipeline = true
2022-11-07 10:42:03,950 [IPC Server handler 12 on default port 9862] INFO
codec.RepeatedOmKeyInfoCodec: RepeatedOmKeyInfoCodec ignorePipeline = true
2022-11-07 10:42:04,087 [IPC Server handler 12 on default port 9862] INFO
audit.AuditLogger: Refresh DebugCmdSet for OMAudit to [].
2022-11-07 11:01:46,603
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
WARN util.JvmPauseMonitor: JvmPauseMonitor-om1: Detected pause in JVM or host
machine (eg GC): pause of approximately 33203468540ns. No GCs detected.
2022-11-07 11:01:46,683
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@2bcb1414] WARN
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of
approximately 33582ms
No GCs detected
2022-11-07 11:01:46,716 [om1@group-C5BA1605619E-LeaderStateImpl] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: changes role from LEADER
to FOLLOWER at term 1 for StepDownReason:JVM_PAUSE
2022-11-07 11:01:46,728 [om1@group-C5BA1605619E-LeaderStateImpl] INFO
impl.RoleInfo: om1: shutdown om1@group-C5BA1605619E-LeaderStateImpl
2022-11-07 11:01:46,737 [om1@group-C5BA1605619E-LeaderStateImpl] INFO
impl.PendingRequests: om1@group-C5BA1605619E-PendingRequests:
sendNotLeaderResponses
2022-11-07 11:01:46,772 [om1@group-C5BA1605619E-LeaderStateImpl] INFO
impl.RoleInfo: om1: start om1@group-C5BA1605619E-FollowerState
2022-11-07 11:01:51,816 [om1@group-C5BA1605619E-FollowerState] INFO
impl.FollowerState: om1@group-C5BA1605619E-FollowerState: change to CANDIDATE,
lastRpcElapsedTime:5046670744ns, electionTimeout:5036ms
2022-11-07 11:01:51,817 [om1@group-C5BA1605619E-FollowerState] INFO
impl.RoleInfo: om1: shutdown om1@group-C5BA1605619E-FollowerState
2022-11-07 11:01:51,817 [om1@group-C5BA1605619E-FollowerState] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: changes role from FOLLOWER
to CANDIDATE at term 1 for changeToCandidate
2022-11-07 11:01:51,819 [om1@group-C5BA1605619E-FollowerState] INFO
server.RaftServerConfigKeys: raft.server.leaderelection.pre-vote = false
(custom)
2022-11-07 11:01:51,820 [om1@group-C5BA1605619E-FollowerState] INFO
impl.RoleInfo: om1: start om1@group-C5BA1605619E-LeaderElection2
2022-11-07 11:01:51,824 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: change Leader from om1 to
null at term 1 for ELECTION
2022-11-07 11:01:51,835 [om1@group-C5BA1605619E-LeaderElection2] INFO
impl.LeaderElection: om1@group-C5BA1605619E-LeaderElection2 ELECTION round 0:
submit vote requests at term 2 for 0:
[om1|rpc:om:9872|admin:|client:|dataStream:|priority:0], old=null
2022-11-07 11:01:51,837 [om1@group-C5BA1605619E-LeaderElection2] INFO
impl.LeaderElection: om1@group-C5BA1605619E-LeaderElection2 ELECTION round 0:
result PASSED (term=2)
2022-11-07 11:01:51,837 [om1@group-C5BA1605619E-LeaderElection2] INFO
impl.RoleInfo: om1: shutdown om1@group-C5BA1605619E-LeaderElection2
2022-11-07 11:01:51,839 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: changes role from CANDIDATE
to LEADER at term 2 for changeToLeader
2022-11-07 11:01:51,839 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: change Leader from null to
om1 at term 2 for becomeLeader, leader elected after 15ms
2022-11-07 11:01:51,840 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.staging.catchup.gap = 1000 (default)
2022-11-07 11:01:51,843 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.write.element-limit = 4096 (default)
2022-11-07 11:01:51,843 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.write.byte-limit = 64MB (=67108864)
(default)
2022-11-07 11:01:51,845 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.watch.timeout = 10s (default)
2022-11-07 11:01:51,845 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.watch.timeout.denomination = 1s
(default)
2022-11-07 11:01:51,846 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.watch.element-limit = 65536 (default)
2022-11-07 11:01:51,848 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.write.element-limit = 4096 (default)
2022-11-07 11:01:51,848 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServerConfigKeys: raft.server.write.follower.gap.ratio.max = -1.0
(default)
2022-11-07 11:01:51,848 [om1@group-C5BA1605619E-LeaderElection2] INFO
impl.RoleInfo: om1: start om1@group-C5BA1605619E-LeaderStateImpl
2022-11-07 11:01:51,855 [om1@group-C5BA1605619E-LeaderElection2] INFO
segmented.SegmentedRaftLogWorker:
om1@group-C5BA1605619E-SegmentedRaftLogWorker: Rolling segment log-0_10 to
index:10
2022-11-07 11:01:51,860 [om1@group-C5BA1605619E-LeaderElection2] INFO
server.RaftServer$Division: om1@group-C5BA1605619E: set configuration 11:
[om1|rpc:om:9872|admin:|client:|dataStream:|priority:0], old=null
2022-11-07 11:01:51,879 [om1@group-C5BA1605619E-SegmentedRaftLogWorker] INFO
segmented.SegmentedRaftLogWorker:
om1@group-C5BA1605619E-SegmentedRaftLogWorker: Rolled log segment from
/data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/log_inprogress_0
to /data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/log_0-10
2022-11-07 11:01:51,930 [om1@group-C5BA1605619E-SegmentedRaftLogWorker] INFO
segmented.SegmentedRaftLogWorker:
om1@group-C5BA1605619E-SegmentedRaftLogWorker: created new log segment
/data/metadata/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/log_inprogress_11
2022-11-07 11:01:51,945 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine: Received Configuration change notification from
Ratis. New Peer list:
[id: "om1"
address: "om:9872"
]
2022-11-07 11:52:49,943
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
WARN util.JvmPauseMonitor: JvmPauseMonitor-om1: Detected pause in JVM or host
machine (eg GC): pause of approximately 154654366ns. No GCs detected.
2022-11-07 13:36:12,845
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
WARN util.JvmPauseMonitor: JvmPauseMonitor-om1: Detected pause in JVM or host
machine (eg GC): pause of approximately 207881497875ns. No GCs detected.
2022-11-07 13:36:12,904
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@2bcb1414] WARN
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of
approximately 207133ms
No GCs detected
2022-11-07 13:36:13,100
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO server.RaftServer: om1: close
2022-11-07 13:36:13,541
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO server.RaftServer$Division: om1@group-C5BA1605619E: shutdown
2022-11-07 13:36:13,662
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO util.JmxRegister: Successfully un-registered JMX Bean with object name
Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1
2022-11-07 13:36:13,667
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO impl.RoleInfo: om1: shutdown om1@group-C5BA1605619E-LeaderStateImpl
2022-11-07 13:36:13,717
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO impl.PendingRequests: om1@group-C5BA1605619E-PendingRequests:
sendNotLeaderResponses
2022-11-07 13:36:13,745
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater: set
stopIndex = 12
2022-11-07 13:36:13,934 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine: Current Snapshot Index (t:2, i:12)
2022-11-07 13:36:14,156 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater: Took a
snapshot at index 12
2022-11-07 13:36:14,169 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater:
snapshotIndex: updateIncreasingly -1 -> 12
2022-11-07 13:36:14,169 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine: StateMachine has shutdown. Shutdown
OzoneManager if not already shutdown.
2022-11-07 13:36:14,178 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
om.OzoneManager: om1[om:9862]: Stopping Ozone Manager
2022-11-07 13:36:14,195 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ipc.Server: Stopping server on 9862
2022-11-07 13:36:14,531 [IPC Server listener on 9862] INFO ipc.Server: Stopping
IPC Server listener on 9862
2022-11-07 13:36:14,686 [IPC Server Responder] INFO ipc.Server: Stopping IPC
Server Responder
2022-11-07 13:36:15,163 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
om.GrpcOzoneManagerServer: Server GrpcOzoneManagerServer is shutdown
2022-11-07 13:36:15,165
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
WARN server.RaftServer$Division: om1@group-C5BA1605619E: Interrupted when
joining stateMachineUpdater
java.lang.InterruptedException
at java.base/java.lang.Object.wait(Native Method)
at java.base/java.lang.Thread.join(Thread.java:1300)
at java.base/java.lang.Thread.join(Thread.java:1375)
at
org.apache.ratis.server.impl.StateMachineUpdater.stopAndJoin(StateMachineUpdater.java:155)
at org.apache.ratis.server.impl.ServerState.close(ServerState.java:424)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$close$4(RaftServerImpl.java:456)
at
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$4(LifeCycle.java:299)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:319)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:297)
at
org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:433)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.close(RaftServerProxy.java:130)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.lambda$close$0(RaftServerProxy.java:118)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.base/java.util.concurrent.ConcurrentHashMap$EntrySpliterator.forEachRemaining(ConcurrentHashMap.java:3645)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at
java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at
java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at
java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.close(RaftServerProxy.java:118)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$close$7(RaftServerProxy.java:399)
at
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$4(LifeCycle.java:299)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:319)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:297)
at
org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:397)
at
org.apache.ratis.server.impl.RaftServerProxy.handleJvmPause(RaftServerProxy.java:224)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$new$0(RaftServerProxy.java:218)
at org.apache.ratis.util.JvmPauseMonitor.handle(JvmPauseMonitor.java:132)
at
org.apache.ratis.util.JvmPauseMonitor.detectPause(JvmPauseMonitor.java:127)
at org.apache.ratis.util.JvmPauseMonitor.run(JvmPauseMonitor.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-11-07 13:36:15,180
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO server.RaftServer$Division: om1@group-C5BA1605619E: closes. applyIndex: 12
2022-11-07 13:36:15,193 [om1@group-C5BA1605619E-SegmentedRaftLogWorker] INFO
segmented.SegmentedRaftLogWorker: om1@group-C5BA1605619E-SegmentedRaftLogWorker
was interrupted, exiting. There are 0 tasks remaining in the queue.
2022-11-07 13:36:15,206
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO segmented.SegmentedRaftLogWorker:
om1@group-C5BA1605619E-SegmentedRaftLogWorker close()
2022-11-07 13:36:15,229
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO server.GrpcService: om1: shutdown server with port 9872 now
2022-11-07 13:36:15,313
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO server.GrpcService: om1: shutdown server with port 9872 successfully
2022-11-07 13:36:15,314
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$401/0x00000008405e0040@33ebe4f0]
INFO util.JvmPauseMonitor: JvmPauseMonitor-om1: Stopped
2022-11-07 13:36:15,314 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine: StateMachine has shutdown. Shutdown
OzoneManager if not already shutdown.
2022-11-07 13:36:15,315 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
ratis.OzoneManagerDoubleBuffer: Stopping OMDoubleBuffer flush thread
2022-11-07 13:36:15,316 [OMDoubleBufferFlushThread] INFO
ratis.OzoneManagerDoubleBuffer: OMDoubleBuffer flush thread
OMDoubleBufferFlushThread is interrupted and will exit.
2022-11-07 13:36:15,339 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
utils.BackgroundService: Shutting down service KeyDeletingService
2022-11-07 13:36:15,340 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
utils.BackgroundService: Shutting down service DirectoryDeletingService
2022-11-07 13:36:15,341 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
utils.BackgroundService: Shutting down service OpenKeyCleanupService
2022-11-07 13:36:15,770 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
handler.ContextHandler: Stopped
o.e.j.w.WebAppContext@2e4ecdf8{ozoneManager,/,null,STOPPED}{jar:file:/opt/hadoop/share/ozone/lib/ozone-manager-1.3.0-SNAPSHOT.jar!/webapps/ozoneManager}
2022-11-07 13:36:15,798 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
server.AbstractConnector: Stopped ServerConnector@3c9e19de{HTTP/1.1,
(http/1.1)}{0.0.0.0:9874}
2022-11-07 13:36:15,799 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
server.session: node0 Stopped scavenging
2022-11-07 13:36:15,801 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@70d5d96b{static,/static,jar:file:/opt/hadoop/share/ozone/lib/ozone-manager-1.3.0-SNAPSHOT.jar!/webapps/static,STOPPED}
2022-11-07 13:36:15,803 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@10745a02{logs,/logs,file:///var/log/hadoop/,STOPPED}
Exception in thread "Trash Emptier" java.lang.UnsupportedOperationException:
fs.getUri() not implemented in TrashOzoneFileSystem
at
org.apache.hadoop.ozone.om.TrashOzoneFileSystem.getUri(TrashOzoneFileSystem.java:126)
at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2592)
at
org.apache.hadoop.ozone.om.TrashPolicyOzone$Emptier.run(TrashPolicyOzone.java:243)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-11-07 13:36:15,835 [shutdown-hook-0] INFO om.OzoneManagerStarter:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down OzoneManager at 1919dc88d3a0/172.18.0.8
************************************************************/
2022-11-07 13:36:15,839 [shutdown-hook-0] INFO om.OzoneManager: om1[om:9862]:
Stopping Ozone Manager
2022-11-07 13:36:15,906 [om1@group-C5BA1605619E-StateMachineUpdater] ERROR
om.OzoneManager: Terminating with exit status 1: OM state machine is shutdown
by Ratis server {code}
SCM docker logs -
{code:java}
2022-11-07 13:24:10,662 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:24:10,751 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:24:40,640 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:24:40,734 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:25:10,619 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:25:10,713 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:25:40,603 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:25:40,692 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:26:10,585 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:26:10,672 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:26:40,365 [ReplicationMonitor] INFO
replication.ReplicationManager: Replication Monitor Thread took 0 milliseconds
for processing 1 containers.
2022-11-07 13:26:40,563 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:26:40,650 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:27:10,543 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:27:10,629 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:27:40,522 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:27:40,608 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:28:10,502 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:28:10,588 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:28:40,481 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:28:40,567 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:29:10,460 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:29:10,546 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:29:40,439 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:29:40,526 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:30:10,419 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:30:10,505 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:30:40,398 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:30:40,484 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:31:10,378 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:31:10,463 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:31:40,150 [ReplicationMonitor] INFO
replication.ReplicationManager: Replication Monitor Thread took 0 milliseconds
for processing 1 containers.
2022-11-07 13:31:40,357 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:31:40,442 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:32:10,335 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:32:10,422 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:32:40,316 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:32:40,402 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:36:12,869
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
WARN util.JvmPauseMonitor:
JvmPauseMonitor-082893a2-221a-4903-b5df-06f738fb1737: Detected pause in JVM or
host machine (eg GC): pause of approximately 207838872309ns. No GCs detected.
2022-11-07 13:36:12,879 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
2022-11-07 13:36:12,881 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
2022-11-07 13:36:12,883 [SCM Heartbeat Processing Thread - 0] INFO
node.NodeStateManager: Detected long delay in scheduling HB processing thread.
Skipping heartbeat checks for one iteration.
2022-11-07 13:36:12,884
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@4f5f474c] WARN
util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of
approximately 208235ms
No GCs detected
2022-11-07 13:36:13,059
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO server.RaftServer: 082893a2-221a-4903-b5df-06f738fb1737: close
2022-11-07 13:36:13,091
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO server.RaftServer$Division:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B: shutdown
2022-11-07 13:36:13,116
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO util.JmxRegister: Successfully un-registered JMX Bean with object name
Ratis:service=RaftServer,group=group-C2AA3100223B,id=082893a2-221a-4903-b5df-06f738fb1737
2022-11-07 13:36:13,118
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO impl.RoleInfo: 082893a2-221a-4903-b5df-06f738fb1737: shutdown
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-LeaderStateImpl
2022-11-07 13:36:13,398
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO impl.PendingRequests:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-PendingRequests:
sendNotLeaderResponses
2022-11-07 13:36:13,449
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO ha.SCMStateMachine: current leader SCM steps down.
2022-11-07 13:36:13,450
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO ha.SCMContext: update <isLeader,term> from <true,3> to <false,0>
2022-11-07 13:36:13,451
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO ha.SCMContext: update <isLeaderReady> from <false> to <false>
2022-11-07 13:36:13,451
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO BackgroundPipelineScrubber: Service BackgroundPipelineScrubber transitions
to PAUSING.
2022-11-07 13:36:13,451
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO ExpiredContainerReplicaOpScrubber: Service
ExpiredContainerReplicaOpScrubber transitions to PAUSING.
2022-11-07 13:36:13,550
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO impl.StateMachineUpdater:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater:
set stopIndex = 28
2022-11-07 13:36:13,944
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO ha.SCMStateMachine: Current Snapshot Index 28, takeSnapshot took 238 ms
2022-11-07 13:36:14,079
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO impl.StateMachineUpdater:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater:
Took a snapshot at index 28
2022-11-07 13:36:14,090
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO impl.StateMachineUpdater:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater:
snapshotIndex: updateIncreasingly -1 -> 28
2022-11-07 13:36:14,262
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Container Balancer is not running.
2022-11-07 13:36:14,263
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping Replication Manager Service.
2022-11-07 13:36:14,263
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO replication.ReplicationManager: Stopping Replication Monitor Thread.
2022-11-07 13:36:14,265 [Over Replicated Processor] WARN
replication.OverReplicatedProcessor: Over Replicated Processor interrupted.
Exiting...
2022-11-07 13:36:14,277 [Under Replicated Processor] WARN
replication.UnderReplicatedProcessor: Under Replicated Processor interrupted.
Exiting...
2022-11-07 13:36:14,286
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping the Datanode Admin Monitor.
2022-11-07 13:36:14,287 [ReplicationMonitor] INFO
replication.ReplicationManager: Replication Monitor Thread is stopped
2022-11-07 13:36:14,293
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping Lease Manager of the command
watchers
2022-11-07 13:36:14,293
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping datanode service RPC server
2022-11-07 13:36:14,295
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.SCMDatanodeProtocolServer: Stopping the RPC server for DataNodes
2022-11-07 13:36:14,305
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO ipc.Server: Stopping server on 9861
2022-11-07 13:36:14,430 [IPC Server listener on 9861] INFO ipc.Server: Stopping
IPC Server listener on 9861
2022-11-07 13:36:14,460 [IPC Server Responder] INFO ipc.Server: Stopping IPC
Server Responder
2022-11-07 13:36:15,884 [SCM Heartbeat Processing Thread - 0] WARN
node.NodeStateManager: Current Thread is interrupted, shutting down HB
processing thread for Node Manager.
2022-11-07 13:36:15,886
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping block service RPC server
2022-11-07 13:36:15,888
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.SCMBlockProtocolServer: Stopping the RPC server for Block Protocol
2022-11-07 13:36:15,889
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO ipc.Server: Stopping server on 9863
2022-11-07 13:36:15,902 [IPC Server listener on 9863] INFO ipc.Server: Stopping
IPC Server listener on 9863
2022-11-07 13:36:15,902 [IPC Server Responder] INFO ipc.Server: Stopping IPC
Server Responder
2022-11-07 13:36:15,919
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping the
StorageContainerLocationProtocol RPC server
2022-11-07 13:36:15,924
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.SCMClientProtocolServer: Stopping the RPC server for Client Protocol
2022-11-07 13:36:15,927
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO ipc.Server: Stopping server on 9860
2022-11-07 13:36:15,934 [IPC Server listener on 9860] INFO ipc.Server: Stopping
IPC Server listener on 9860
2022-11-07 13:36:15,940 [IPC Server Responder] INFO ipc.Server: Stopping IPC
Server Responder
2022-11-07 13:36:15,964
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping Storage Container Manager HTTP
server.
2022-11-07 13:36:16,467
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO handler.ContextHandler: Stopped
o.e.j.w.WebAppContext@7cdb7fc{scm,/,null,STOPPED}{jar:file:/opt/hadoop/share/ozone/lib/hdds-server-scm-1.3.0-SNAPSHOT.jar!/webapps/scm}
2022-11-07 13:36:16,488
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.AbstractConnector: Stopped ServerConnector@42c54bad{HTTP/1.1,
(http/1.1)}{0.0.0.0:9876}
2022-11-07 13:36:16,491
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.session: node0 Stopped scavenging
2022-11-07 13:36:16,494
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@74c121d4{static,/static,jar:file:/opt/hadoop/share/ozone/lib/hdds-server-scm-1.3.0-SNAPSHOT.jar!/webapps/static,STOPPED}
2022-11-07 13:36:16,495
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@4bce62b{logs,/logs,file:///var/log/hadoop/,STOPPED}
2022-11-07 13:36:16,504
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping SCM LayoutVersionManager Service.
2022-11-07 13:36:16,505
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping Block Manager Service.
2022-11-07 13:36:16,506
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO utils.BackgroundService: Shutting down service SCMBlockDeletingService
2022-11-07 13:36:16,509
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO utils.BackgroundService: Shutting down service SCMBlockDeletingService
2022-11-07 13:36:16,520
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping SCM Event Queue.
2022-11-07 13:36:16,549
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping SCM HA services.
2022-11-07 13:36:16,549
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO ha.SCMRatisServerImpl: stopping ratis server 0.0.0.0:9894
2022-11-07 13:36:16,550
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
WARN server.RaftServer$Division:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B: Interrupted when
joining stateMachineUpdater
java.lang.InterruptedException
at java.base/java.lang.Object.wait(Native Method)
at java.base/java.lang.Thread.join(Thread.java:1300)
at java.base/java.lang.Thread.join(Thread.java:1375)
at
org.apache.ratis.server.impl.StateMachineUpdater.stopAndJoin(StateMachineUpdater.java:155)
at org.apache.ratis.server.impl.ServerState.close(ServerState.java:424)
at
org.apache.ratis.server.impl.RaftServerImpl.lambda$close$4(RaftServerImpl.java:456)
at
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$4(LifeCycle.java:299)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:319)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:297)
at
org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:433)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.close(RaftServerProxy.java:130)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.lambda$close$0(RaftServerProxy.java:118)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.base/java.util.concurrent.ConcurrentHashMap$EntrySpliterator.forEachRemaining(ConcurrentHashMap.java:3645)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at
java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at
java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at
java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
at
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.close(RaftServerProxy.java:118)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$close$7(RaftServerProxy.java:399)
at
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$4(LifeCycle.java:299)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:319)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:297)
at
org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:397)
at
org.apache.ratis.server.impl.RaftServerProxy.handleJvmPause(RaftServerProxy.java:224)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$new$0(RaftServerProxy.java:218)
at org.apache.ratis.util.JvmPauseMonitor.handle(JvmPauseMonitor.java:132)
at
org.apache.ratis.util.JvmPauseMonitor.detectPause(JvmPauseMonitor.java:127)
at org.apache.ratis.util.JvmPauseMonitor.run(JvmPauseMonitor.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-11-07 13:36:16,564
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO server.RaftServer$Division:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B: closes. applyIndex: 28
2022-11-07 13:36:16,569
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-SegmentedRaftLogWorker]
INFO segmented.SegmentedRaftLogWorker:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-SegmentedRaftLogWorker
was interrupted, exiting. There are 0 tasks remaining in the queue.
2022-11-07 13:36:16,576
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO segmented.SegmentedRaftLogWorker:
082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-SegmentedRaftLogWorker
close()
2022-11-07 13:36:16,595
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO server.GrpcService: 082893a2-221a-4903-b5df-06f738fb1737: shutdown server
with port 9894 now
2022-11-07 13:36:16,643
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO server.GrpcService: 082893a2-221a-4903-b5df-06f738fb1737: shutdown server
with port 9894 successfully
2022-11-07 13:36:16,645
[org.apache.ratis.util.JvmPauseMonitor$$Lambda$421/0x0000000840471840@64ae105d]
INFO util.JvmPauseMonitor:
JvmPauseMonitor-082893a2-221a-4903-b5df-06f738fb1737: Stopped
2022-11-07 13:36:16,661
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO pipeline.BackgroundPipelineCreator: Stopping RatisPipelineUtilsThread.
2022-11-07 13:36:16,661 [RatisPipelineUtilsThread - 0] WARN
pipeline.BackgroundPipelineCreator: RatisPipelineUtilsThread is interrupted.
2022-11-07 13:36:16,662
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO BackgroundPipelineScrubber: Stopping BackgroundPipelineScrubber Service.
2022-11-07 13:36:16,663 [BackgroundPipelineScrubberThread] WARN
BackgroundPipelineScrubber: BackgroundPipelineScrubber is interrupted, exit
2022-11-07 13:36:16,665
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
INFO server.StorageContainerManager: Stopping SCM MetadataStore.
2022-11-07 13:36:16,677 [shutdown-hook-0] INFO server.StorageContainerManager:
Container Balancer is not running.
2022-11-07 13:36:16,678 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping Replication Manager Service.
2022-11-07 13:36:16,678 [shutdown-hook-0] INFO replication.ReplicationManager:
Replication Monitor Thread is not running.
2022-11-07 13:36:16,678 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping the Datanode Admin Monitor.
2022-11-07 13:36:16,678 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping Lease Manager of the command watchers
2022-11-07 13:36:16,678 [shutdown-hook-0] INFO
server.StorageContainerManagerStarter: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down StorageContainerManager at 48b0cc8b7045/172.18.0.7
************************************************************/
2022-11-07 13:36:16,681 [shutdown-hook-0] ERROR server.StorageContainerManager:
Lease Manager of the command watchers stop failed
2022-11-07 13:36:16,681 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping datanode service RPC server
2022-11-07 13:36:16,681 [shutdown-hook-0] INFO
server.SCMDatanodeProtocolServer: Stopping the RPC server for DataNodes
2022-11-07 13:36:16,684 [shutdown-hook-0] INFO ipc.Server: Stopping server on
9861
2022-11-07 13:36:16,685 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping block service RPC server
2022-11-07 13:36:16,685 [shutdown-hook-0] INFO server.SCMBlockProtocolServer:
Stopping the RPC server for Block Protocol
2022-11-07 13:36:16,686 [shutdown-hook-0] INFO ipc.Server: Stopping server on
9863
2022-11-07 13:36:16,686 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping the StorageContainerLocationProtocol RPC server
2022-11-07 13:36:16,686 [shutdown-hook-0] INFO server.SCMClientProtocolServer:
Stopping the RPC server for Client Protocol
2022-11-07 13:36:16,688 [shutdown-hook-0] INFO ipc.Server: Stopping server on
9860
2022-11-07 13:36:16,688 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping Storage Container Manager HTTP server.
2022-11-07 13:36:16,689 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping SCM LayoutVersionManager Service.
2022-11-07 13:36:16,689 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping Block Manager Service.
2022-11-07 13:36:16,689 [shutdown-hook-0] INFO utils.BackgroundService:
Shutting down service SCMBlockDeletingService
2022-11-07 13:36:16,689 [shutdown-hook-0] INFO utils.BackgroundService:
Shutting down service SCMBlockDeletingService
2022-11-07 13:36:16,689 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping SCM Event Queue.
2022-11-07 13:36:16,690 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping SCM HA services.
2022-11-07 13:36:16,690 [shutdown-hook-0] INFO ha.SCMRatisServerImpl: stopping
ratis server 0.0.0.0:9894
2022-11-07 13:36:16,692 [shutdown-hook-0] WARN
pipeline.BackgroundPipelineCreator: RatisPipelineUtilsThread is not running,
just ignore.
2022-11-07 13:36:16,692 [shutdown-hook-0] INFO BackgroundPipelineScrubber:
BackgroundPipelineScrubber Service is not running, skip stop.
2022-11-07 13:36:16,692 [shutdown-hook-0] INFO server.StorageContainerManager:
Stopping SCM MetadataStore.
2022-11-07 13:36:16,698 [shutdown-hook-0] INFO impl.MetricsSystemImpl: Stopping
StorageContainerManager metrics system...
2022-11-07 13:36:16,710 [prometheus] INFO impl.MetricsSinkAdapter: prometheus
thread interrupted.
2022-11-07 13:36:16,713 [shutdown-hook-0] INFO impl.MetricsSystemImpl:
StorageContainerManager metrics system stopped.
2022-11-07 13:36:16,723
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
WARN impl.MetricsSystemImpl: StorageContainerManager metrics system not yet
started!
org.apache.hadoop.metrics2.MetricsException: Illegal stop
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:199)
at
org.apache.hadoop.hdds.scm.server.StorageContainerManager.stop(StorageContainerManager.java:1647)
at
org.apache.hadoop.hdds.scm.server.StorageContainerManager.shutDown(StorageContainerManager.java:1655)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.close(SCMStateMachine.java:403)
at
org.apache.ratis.server.impl.StateMachineUpdater.stop(StateMachineUpdater.java:134)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:187)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-11-07 13:36:16,735
[082893a2-221a-4903-b5df-06f738fb1737@group-C2AA3100223B-StateMachineUpdater]
ERROR server.StorageContainerManager: Terminating with exit status 1: scm
statemachine is closed by ratis, terminate SCM {code}
> [snapshot] OM Process exiting on idle cluster
> ---------------------------------------------
>
> Key: HDDS-7374
> URL: https://issues.apache.org/jira/browse/HDDS-7374
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Manager
> Affects Versions: 1.3.0
> Environment: docker-compose on HDDS-6517-Snapshot
> Reporter: Jyotirmoy Sinha
> Assignee: Hemant Kumar
> Priority: Critical
> Labels: ozone-snapshot
>
> Snapshots were created on the cluster, after which the system was kept idle.
> After certain interval the OM Process exited and shut down.
> Docker logs -
> {code:java}
> 2022-10-20 12:53:31,082
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> WARN util.JvmPauseMonitor: JvmPauseMonitor-om1: Detected pause in JVM or
> host machine (eg GC): pause of approximately 7196529648812ns. No GCs detected.
> 2022-10-20 12:53:27,062
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO server.RaftServer: om1: close
> 2022-10-20 12:53:27,128
> [org.apache.hadoop.util.JvmPauseMonitor$Monitor@60844992] WARN
> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of
> approximately 7197056ms
> No GCs detected
> 2022-10-20 12:53:27,166
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO server.RaftServer$Division: om1@group-C5BA1605619E: shutdown
> 2022-10-20 12:53:27,177
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO util.JmxRegister: Successfully un-registered JMX Bean with object name
> Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1
> 2022-10-20 12:53:27,196
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO impl.RoleInfo: om1: shutdown om1@group-C5BA1605619E-LeaderStateImpl
> 2022-10-20 12:53:27,212
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO impl.PendingRequests: om1@group-C5BA1605619E-PendingRequests:
> sendNotLeaderResponses
> 2022-10-20 12:53:27,366
> [org.apache.ratis.util.JvmPauseMonitor$$Lambda$398/0x00000008404cac40@bfe47a8]
> INFO impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater:
> set stopIndex = 10
> 2022-10-20 12:53:27,368 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> ratis.OzoneManagerStateMachine: Current Snapshot Index (t:1, i:10)
> 2022-10-20 12:53:27,390 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater: Took a
> snapshot at index 10
> 2022-10-20 12:53:27,398 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> impl.StateMachineUpdater: om1@group-C5BA1605619E-StateMachineUpdater:
> snapshotIndex: updateIncreasingly -1 -> 10
> 2022-10-20 12:53:27,398 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> ratis.OzoneManagerStateMachine: StateMachine has shutdown. Shutdown
> OzoneManager if not already shutdown.
> 2022-10-20 12:53:27,399 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> om.OzoneManager: om1[om:9862]: Stopping Ozone Manager
> 2022-10-20 12:53:27,474 [om1@group-C5BA1605619E-StateMachineUpdater] INFO
> ipc.Server: Stopping server on 9862
> 2022-10-20 12:53:27,609 [IPC Server listener on 9862] INFO ipc.Server:
> Stopping IPC Server listener on 9862
> 2022-10-20 12:53:27,628 [IPC Server Responder] INFO ipc.Server: Stopping IPC
> Server Responder {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]