[ 
https://issues.apache.org/jira/browse/HDDS-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-8945:
-----------------------------
    Description: 
The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would 
occasionally pass, but most of the time it would fail at the OM restart step 
because om.stop() is stuck at Ratis retrying to quit, resulting in OM main 
RocksDB LOCK not being released when it is restarting, printing a ton of 
{{Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception}} warning message before it times out:

{code}
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown 
omNode-3@group-6E8B7B074D31-FollowerState
2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO  
impl.FollowerState (FollowerState.java:run(152)) - 
omNode-3@group-6E8B7B074D31-FollowerState was interrupted
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  impl.StateMachineUpdater 
(StateMachineUpdater.java:stopAndJoin(155)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO  
snapshot.OMSnapshotCreateRequest 
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot: 
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 
'vol-47883/buck-56159'
2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine 
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
(t:1, i:4)
2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO  
snapshot.OMSnapshotCreateRequest 
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot: 
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 
'vol-47883/buck-56159'
2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: null: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1, 
i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6), 
METADATAENTRY(c:5): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: client cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: null: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
cancelled
2023-06-27 20:36:27,181 [main] INFO  server.GrpcService 
(GrpcService.java:closeImpl(320)) - omNode-3: shutdown server 
GrpcServerProtocolService successfully
2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex: 
updateIncreasingly -1 -> 4
2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine 
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
(t:1, i:4)
2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 194 milliseconds
2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 198 milliseconds
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 206 milliseconds
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine (OzoneManagerStateMachine.java:close(533)) - 
StateMachine has shutdown. Shutdown OzoneManager if not already shutdown.
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerDoubleBuffer (OzoneManagerDoubleBuffer.java:stopDaemon(551)) 
- Stopping OMDoubleBuffer flush thread
2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO  
server.RaftServer$Division (ServerState.java:close(472)) - 
omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 4 -> 3
2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 3 -> 2
2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 4 -> 3
{code}

{code}
2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
{code}

Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff and 
run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.

  was:
The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would 
occasionally pass, but most of the time it would fail at the OM restart step 
because om.stop() is stuck at Ratis retrying to quit, resulting in OM main 
RocksDB LOCK not being released when it is restarting, printing a ton of 
{{Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception}} WARM message before it times out:

{code}
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown 
omNode-3@group-6E8B7B074D31-FollowerState
2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO  
impl.FollowerState (FollowerState.java:run(152)) - 
omNode-3@group-6E8B7B074D31-FollowerState was interrupted
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  impl.StateMachineUpdater 
(StateMachineUpdater.java:stopAndJoin(155)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO  
snapshot.OMSnapshotCreateRequest 
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot: 
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 
'vol-47883/buck-56159'
2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine 
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
(t:1, i:4)
2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO  
snapshot.OMSnapshotCreateRequest 
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot: 
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 
'vol-47883/buck-56159'
2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: null: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1, 
i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6), 
METADATAENTRY(c:5): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: client cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
APPEND_ENTRIES onError, lastRequest: null: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
cancelled
2023-06-27 20:36:27,181 [main] INFO  server.GrpcService 
(GrpcService.java:closeImpl(320)) - omNode-3: shutdown server 
GrpcServerProtocolService successfully
2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex: 
updateIncreasingly -1 -> 4
2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine 
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
(t:1, i:4)
2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 194 milliseconds
2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 198 milliseconds
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
Created checkpoint in rocksDB at 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 in 206 milliseconds
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  db.RDBCheckpointUtils 
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
milliseconds for checkpoint directory 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 availability.
2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO  om.OmSnapshotManager 
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint : 
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
 for snapshot snap-48567
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerStateMachine (OzoneManagerStateMachine.java:close(533)) - 
StateMachine has shutdown. Shutdown OzoneManager if not already shutdown.
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO  
ratis.OzoneManagerDoubleBuffer (OzoneManagerDoubleBuffer.java:stopDaemon(551)) 
- Stopping OMDoubleBuffer flush thread
2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO  
server.RaftServer$Division (ServerState.java:close(472)) - 
omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 4 -> 3
2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 3 -> 2
2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO  leader.FollowerInfo 
(FollowerInfoImpl.java:lambda$new$0(52)) - 
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
updateUnconditionally 4 -> 3
{code}

{code}
2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN  server.GrpcLogAppender 
(LogUtils.java:warn(122)) - 
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
{code}

Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff and 
run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.


> Investigate potential Ratis issue or incorrect usage in OM integrationt test
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-8945
>                 URL: https://issues.apache.org/jira/browse/HDDS-8945
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Siyao Meng
>            Priority: Major
>         Attachments: HDDS-8945.debug.001.diff
>
>
> The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would 
> occasionally pass, but most of the time it would fail at the OM restart step 
> because om.stop() is stuck at Ratis retrying to quit, resulting in OM main 
> RocksDB LOCK not being released when it is restarting, printing a ton of 
> {{Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception}} warning message before it times out:
> {code}
> 2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown 
> omNode-3@group-6E8B7B074D31-FollowerState
> 2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO  
> impl.FollowerState (FollowerState.java:run(152)) - 
> omNode-3@group-6E8B7B074D31-FollowerState was interrupted
> 2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO  
> impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(155)) - 
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
> 2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO  
> snapshot.OMSnapshotCreateRequest 
> (OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created 
> snapshot: 'snap-48567' with snapshotId: 
> '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 'vol-47883/buck-56159'
> 2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  ratis.OzoneManagerStateMachine 
> (OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
> (t:1, i:4)
> 2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO  
> snapshot.OMSnapshotCreateRequest 
> (OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created 
> snapshot: 'snap-48567' with snapshotId: 
> '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 'vol-47883/buck-56159'
> 2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN  
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
> APPEND_ENTRIES onError, lastRequest: null: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
> cancelled
> 2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN  
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
> APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1, 
> i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6), 
> METADATAENTRY(c:5): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
> cancelled
> 2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN  
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3: 
> APPEND_ENTRIES onError, lastRequest: null: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client 
> cancelled
> 2023-06-27 20:36:27,181 [main] INFO  server.GrpcService 
> (GrpcService.java:closeImpl(320)) - omNode-3: shutdown server 
> GrpcServerProtocolService successfully
> 2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 5 -> 4
> 2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
> 2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) - 
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex: 
> updateIncreasingly -1 -> 4
> 2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  ratis.OzoneManagerStateMachine 
> (OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index 
> (t:1, i:4)
> 2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
> Created checkpoint in rocksDB at 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  in 194 milliseconds
> 2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointUtils 
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13 
> milliseconds for checkpoint directory 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  availability.
> 2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO  
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) 
> - Created checkpoint : 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  for snapshot snap-48567
> 2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
> Created checkpoint in rocksDB at 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  in 198 milliseconds
> 2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointUtils 
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
> milliseconds for checkpoint directory 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  availability.
> 2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO  
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) 
> - Created checkpoint : 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  for snapshot snap-48567
> 2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) - 
> Created checkpoint in rocksDB at 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  in 206 milliseconds
> 2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO  
> db.RDBCheckpointUtils 
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0 
> milliseconds for checkpoint directory 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  availability.
> 2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO  
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) 
> - Created checkpoint : 
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
>  for snapshot snap-48567
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) - 
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  ratis.OzoneManagerStateMachine 
> (OzoneManagerStateMachine.java:close(533)) - StateMachine has shutdown. 
> Shutdown OzoneManager if not already shutdown.
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] 
> INFO  ratis.OzoneManagerDoubleBuffer 
> (OzoneManagerDoubleBuffer.java:stopDaemon(551)) - Stopping OMDoubleBuffer 
> flush thread
> 2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO  
> server.RaftServer$Division (ServerState.java:close(472)) - 
> omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
> 2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 4 -> 3
> 2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 3 -> 2
> 2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 5 -> 4
> 2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO  leader.FollowerInfo 
> (FollowerInfoImpl.java:lambda$new$0(52)) - 
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex: 
> updateUnconditionally 4 -> 3
> {code}
> {code}
> 2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> 2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN  
> server.GrpcLogAppender (LogUtils.java:warn(122)) - 
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed 
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> {code}
> Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff 
> and run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to