[
https://issues.apache.org/jira/browse/HDDS-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-8945:
-----------------------------
Description:
The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would
occasionally pass, but most of the time it would fail at the OM restart step
because om.stop() is stuck at Ratis retrying to quit, resulting in OM main
RocksDB LOCK not being released when it is restarting, printing a ton of
{{Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception}} warning message before it times out:
{code}
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO impl.RoleInfo
(RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown
omNode-3@group-6E8B7B074D31-FollowerState
2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO
impl.FollowerState (FollowerState.java:run(152)) -
omNode-3@group-6E8B7B074D31-FollowerState was interrupted
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO impl.StateMachineUpdater
(StateMachineUpdater.java:stopAndJoin(155)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO
snapshot.OMSnapshotCreateRequest
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot:
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path
'vol-47883/buck-56159'
2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
(t:1, i:4)
2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO
snapshot.OMSnapshotCreateRequest
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot:
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path
'vol-47883/buck-56159'
2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: null:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1,
i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6),
METADATAENTRY(c:5): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: client cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: null:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
cancelled
2023-06-27 20:36:27,181 [main] INFO server.GrpcService
(GrpcService.java:closeImpl(320)) - omNode-3: shutdown server
GrpcServerProtocolService successfully
2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex:
updateIncreasingly -1 -> 4
2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
(t:1, i:4)
2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 194 milliseconds
2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 198 milliseconds
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 206 milliseconds
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine (OzoneManagerStateMachine.java:close(533)) -
StateMachine has shutdown. Shutdown OzoneManager if not already shutdown.
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerDoubleBuffer (OzoneManagerDoubleBuffer.java:stopDaemon(551))
- Stopping OMDoubleBuffer flush thread
2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO
server.RaftServer$Division (ServerState.java:close(472)) -
omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 4 -> 3
2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 3 -> 2
2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 4 -> 3
{code}
{code}
2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
{code}
Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff and
run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.
was:
The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would
occasionally pass, but most of the time it would fail at the OM restart step
because om.stop() is stuck at Ratis retrying to quit, resulting in OM main
RocksDB LOCK not being released when it is restarting, printing a ton of
{{Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception}} WARM message before it times out:
{code}
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO impl.RoleInfo
(RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown
omNode-3@group-6E8B7B074D31-FollowerState
2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO
impl.FollowerState (FollowerState.java:run(152)) -
omNode-3@group-6E8B7B074D31-FollowerState was interrupted
2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO impl.StateMachineUpdater
(StateMachineUpdater.java:stopAndJoin(155)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO
snapshot.OMSnapshotCreateRequest
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot:
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path
'vol-47883/buck-56159'
2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
(t:1, i:4)
2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO
snapshot.OMSnapshotCreateRequest
(OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created snapshot:
'snap-48567' with snapshotId: '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path
'vol-47883/buck-56159'
2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: null:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1,
i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6),
METADATAENTRY(c:5): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: client cancelled
2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
APPEND_ENTRIES onError, lastRequest: null:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
cancelled
2023-06-27 20:36:27,181 [main] INFO server.GrpcService
(GrpcService.java:closeImpl(320)) - omNode-3: shutdown server
GrpcServerProtocolService successfully
2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex:
updateIncreasingly -1 -> 4
2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine
(OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
(t:1, i:4)
2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 194 milliseconds
2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 198 milliseconds
2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO
db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
Created checkpoint in rocksDB at
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
in 206 milliseconds
2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO db.RDBCheckpointUtils
(RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
milliseconds for checkpoint directory
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
availability.
2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO om.OmSnapshotManager
(OmSnapshotManager.java:createOmSnapshotCheckpoint(468)) - Created checkpoint :
/Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
for snapshot snap-48567
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerStateMachine (OzoneManagerStateMachine.java:close(533)) -
StateMachine has shutdown. Shutdown OzoneManager if not already shutdown.
2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater] INFO
ratis.OzoneManagerDoubleBuffer (OzoneManagerDoubleBuffer.java:stopDaemon(551))
- Stopping OMDoubleBuffer flush thread
2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO
server.RaftServer$Division (ServerState.java:close(472)) -
omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 4 -> 3
2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 3 -> 2
2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 7 -> 6
2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 6 -> 5
2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 5 -> 4
2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO leader.FollowerInfo
(FollowerInfoImpl.java:lambda$new$0(52)) -
omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
updateUnconditionally 4 -> 3
{code}
{code}
2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN server.GrpcLogAppender
(LogUtils.java:warn(122)) -
omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
{code}
Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff and
run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.
> Investigate potential Ratis issue or incorrect usage in OM integrationt test
> ----------------------------------------------------------------------------
>
> Key: HDDS-8945
> URL: https://issues.apache.org/jira/browse/HDDS-8945
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Siyao Meng
> Priority: Major
> Attachments: HDDS-8945.debug.001.diff
>
>
> The integration test `TestOmSnapshotDisabledRestart` added in HDDS-8555 would
> occasionally pass, but most of the time it would fail at the OM restart step
> because om.stop() is stuck at Ratis retrying to quit, resulting in OM main
> RocksDB LOCK not being released when it is restarting, printing a ton of
> {{Failed appendEntries:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception}} warning message before it times out:
> {code}
> 2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO impl.RoleInfo
> (RoleInfo.java:shutdownFollowerState(110)) - omNode-3: shutdown
> omNode-3@group-6E8B7B074D31-FollowerState
> 2023-06-27 20:36:27,174 [omNode-3@group-6E8B7B074D31-FollowerState] INFO
> impl.FollowerState (FollowerState.java:run(152)) -
> omNode-3@group-6E8B7B074D31-FollowerState was interrupted
> 2023-06-27 20:36:27,174 [omNode-3-impl-thread1] INFO
> impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(155)) -
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: set stopIndex = 5
> 2023-06-27 20:36:27,175 [OM StateMachine ApplyTransaction Thread - 0] INFO
> snapshot.OMSnapshotCreateRequest
> (OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created
> snapshot: 'snap-48567' with snapshotId:
> '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 'vol-47883/buck-56159'
> 2023-06-27 20:36:27,175 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO ratis.OzoneManagerStateMachine
> (OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
> (t:1, i:4)
> 2023-06-27 20:36:27,178 [OM StateMachine ApplyTransaction Thread - 0] INFO
> snapshot.OMSnapshotCreateRequest
> (OMSnapshotCreateRequest.java:validateAndUpdateCache(193)) - Created
> snapshot: 'snap-48567' with snapshotId:
> '4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa' under path 'vol-47883/buck-56159'
> 2023-06-27 20:36:27,180 [grpc-default-executor-6] WARN
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
> APPEND_ENTRIES onError, lastRequest: null:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
> cancelled
> 2023-06-27 20:36:27,180 [grpc-default-executor-2] WARN
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
> APPEND_ENTRIES onError, lastRequest: omNode-1->omNode-3#38-t1,previous=(t:1,
> i:5),leaderCommit=5,initializing? true,entries: size=1, first=(t:1, i:6),
> METADATAENTRY(c:5):
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
> cancelled
> 2023-06-27 20:36:27,180 [grpc-default-executor-5] WARN
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-3:
> APPEND_ENTRIES onError, lastRequest: null:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
> cancelled
> 2023-06-27 20:36:27,181 [main] INFO server.GrpcService
> (GrpcService.java:closeImpl(320)) - omNode-3: shutdown server
> GrpcServerProtocolService successfully
> 2023-06-27 20:36:27,181 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,181 [grpc-default-executor-2] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,181 [grpc-default-executor-6] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
> 2023-06-27 20:36:27,182 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,182 [grpc-default-executor-2] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,182 [grpc-default-executor-6] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 5 -> 4
> 2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
> 2023-06-27 20:36:27,349 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(97)) -
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: snapshotIndex:
> updateIncreasingly -1 -> 4
> 2023-06-27 20:36:27,350 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO ratis.OzoneManagerStateMachine
> (OzoneManagerStateMachine.java:takeSnapshot(479)) - Current Snapshot Index
> (t:1, i:4)
> 2023-06-27 20:36:27,365 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
> Created checkpoint in rocksDB at
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> in 194 milliseconds
> 2023-06-27 20:36:27,379 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointUtils
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 13
> milliseconds for checkpoint directory
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> availability.
> 2023-06-27 20:36:27,381 [OMDoubleBufferFlushThread] INFO
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468))
> - Created checkpoint :
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-1/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> for snapshot snap-48567
> 2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
> Created checkpoint in rocksDB at
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> in 198 milliseconds
> 2023-06-27 20:36:27,392 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointUtils
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
> milliseconds for checkpoint directory
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> availability.
> 2023-06-27 20:36:27,393 [OMDoubleBufferFlushThread] INFO
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468))
> - Created checkpoint :
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-2/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> for snapshot snap-48567
> 2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointManager (RDBCheckpointManager.java:createCheckpoint(89)) -
> Created checkpoint in rocksDB at
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> in 206 milliseconds
> 2023-06-27 20:36:27,396 [OMDoubleBufferFlushThread] INFO
> db.RDBCheckpointUtils
> (RDBCheckpointUtils.java:waitForCheckpointDirectoryExist(59)) - Waited for 0
> milliseconds for checkpoint directory
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> availability.
> 2023-06-27 20:36:27,397 [OMDoubleBufferFlushThread] INFO
> om.OmSnapshotManager (OmSnapshotManager.java:createOmSnapshotCheckpoint(468))
> - Created checkpoint :
> /Users/smeng/repo/ozone-master/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-5ed4ff4b-1eeb-48c3-bfe1-e68bac0a1195/omNode-3/db.snapshots/checkpointState/om.db-4714bd6c-6f41-4d7e-8a8e-f31f8cb18aaa
> for snapshot snap-48567
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(291)) -
> omNode-3@group-6E8B7B074D31-StateMachineUpdater: Took a snapshot at index 4
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO ratis.OzoneManagerStateMachine
> (OzoneManagerStateMachine.java:close(533)) - StateMachine has shutdown.
> Shutdown OzoneManager if not already shutdown.
> 2023-06-27 20:36:27,399 [omNode-3@group-6E8B7B074D31-StateMachineUpdater]
> INFO ratis.OzoneManagerDoubleBuffer
> (OzoneManagerDoubleBuffer.java:stopDaemon(551)) - Stopping OMDoubleBuffer
> flush thread
> 2023-06-27 20:36:27,400 [omNode-3-impl-thread1] INFO
> server.RaftServer$Division (ServerState.java:close(472)) -
> omNode-3@group-6E8B7B074D31: closes. applyIndex: 5
> 2023-06-27 20:36:27,680 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,681 [grpc-default-executor-2] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,681 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 4 -> 3
> 2023-06-27 20:36:27,681 [grpc-default-executor-2] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 3 -> 2
> 2023-06-27 20:36:27,683 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,683 [grpc-default-executor-2] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,684 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,684 [grpc-default-executor-2] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,690 [grpc-default-executor-2] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,690 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,690 [grpc-default-executor-2] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 7 -> 6
> 2023-06-27 20:36:27,690 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 6 -> 5
> 2023-06-27 20:36:27,930 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,930 [grpc-default-executor-2] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:27,932 [grpc-default-executor-5] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 5 -> 4
> 2023-06-27 20:36:27,932 [grpc-default-executor-2] INFO leader.FollowerInfo
> (FollowerInfoImpl.java:lambda$new$0(52)) -
> omNode-1@group-6E8B7B074D31->omNode-3: decreaseNextIndex nextIndex:
> updateUnconditionally 4 -> 3
> {code}
> {code}
> 2023-06-27 20:36:57,356 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,356 [grpc-default-executor-6] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,607 [grpc-default-executor-6] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,607 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,861 [grpc-default-executor-6] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:57,861 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:58,116 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> 2023-06-27 20:36:58,116 [grpc-default-executor-6] WARN
> server.GrpcLogAppender (LogUtils.java:warn(122)) -
> omNode-1@group-6E8B7B074D31->omNode-3-AppendLogResponseHandler: Failed
> appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception
> {code}
> Attached is a diff in MiniOzoneHA to help debug this issue. Apply the diff
> and run `TestOmSnapshotDisabledRestart#testSnapshotFeatureFlag` to repro.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]