[
https://issues.apache.org/jira/browse/HDDS-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850212#comment-17850212
]
Duong commented on HDDS-10924:
------------------------------
Looks like this is related to RATIS-2045.
When a follower SCM is added to a leader with an empty backlog, the leader asks
the follower to install an (empty) snapshot. At this moment,
scm.getSCMHANodeDetails().getPeerNodeDetails() is empty and
SCMStateMachine.notifyInstallSnapshotFromLeader can't handle it.
In the previous ratis version, no snapshot installation was required, so the
test passed before.
> TestSCMHAManagerImpl#testAddSCM fails on ratis master
> -----------------------------------------------------
>
> Key: HDDS-10924
> URL: https://issues.apache.org/jira/browse/HDDS-10924
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Duong
> Priority: Major
>
> TestSCMHAManagerImpl#testAddSCM fails on ratis master with the following
> errors.
> {code:java}
> org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException:
> 1c87d01a-bf32-4066-b42b-826ac82dfd8f@group-16EE0E876D68-ConfigurationStagingState:
> Fail to set configuration peers:[follower|localhost:9898,
> 1c87d01a-bf32-4066-b42b-826ac82dfd8f|localhost:9894]|listeners:[] due to
> NOPROGRESS
> 1546 at
> org.apache.ratis.server.impl.LeaderStateImpl$ConfigurationStagingState.fail(LeaderStateImpl.java:1262)
> 1547 at
> org.apache.ratis.server.impl.LeaderStateImpl.checkStaging(LeaderStateImpl.java:842)
> 1548 at
> org.apache.ratis.server.impl.LeaderStateImpl.access$700(LeaderStateImpl.java:101)
> 1549 at
> org.apache.ratis.server.impl.LeaderStateImpl$EventProcessor.run(LeaderStateImpl.java:774)
> 1550 {code}
> Snapshot installation failed because of IllegalStateException.
> {code:java}
> 2024-05-28 19:54:26,832 [grpc-default-executor-0] ERROR
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:installSnapshot(99)) -
> 388e333f-7af0-4453-8dd3-61733cda944d@group-16EE0E876D68: installSnapshot
> failed
> java.lang.IllegalStateException
> at com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyInstallSnapshotFromLeader(SCMStateMachine.java:239)
> at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:267)
> at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:128)
> at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:97)
> at
> org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1652)
> at
> org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:675)
> at
> org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:349)
> at
> org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:346)
> at
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.process(GrpcServerProtocolService.java:106)
> at
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:174)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840) {code}
> Then the following log messages repeat forever.
> {code:java}
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(674)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
> InstallSnapshot in progress.
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1]
> INFO server.RaftServer$Division
> (RaftServerImpl.java:checkInconsistentAppendEntries(1621)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: Failed appendEntries
> as snapshot (0) installation is in progress
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1]
> INFO server.RaftServer$Division
> (RaftServerImpl.java:appendEntriesAsync(1575)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: appendEntries* reply
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#127986:FAIL-t2,INCONSISTENCY,nextIndex=0,followerCommit=-1,matchIndex=-1
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] WARN
> server.GrpcLogAppender (GrpcLogAppender.java:onNextImpl(528)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-AppendLogResponseHandler:
> received INCONSISTENCY reply with nextIndex 0, errorCount=1,
> request=AppendEntriesRequest:cid=127986,entriesCount=0
> 2024-05-28 15:25:57,037
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
> INFO server.GrpcLogAppender
> (GrpcLogAppender.java:notifyInstallSnapshot(799)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
> notifyInstallSnapshot with firstAvailable=(t:1, i:0), followerNextIndex=0
> 2024-05-28 15:25:57,037
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
> INFO server.GrpcLogAppender
> (GrpcLogAppender.java:notifyInstallSnapshot(807)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
> send b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:installSnapshot(92)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: receive
> installSnapshot:
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:installSnapshot(103)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: reply
> installSnapshot:
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO
> server.GrpcServerProtocolService
> (GrpcServerProtocolService.java:onCompleted(200)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT,
> lastRequest: b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1,
> i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO
> server.GrpcServerProtocolService
> (GrpcServerProtocolService.java:lambda$onCompleted$7(202)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, lastReply:
> null
> 2024-05-28 15:25:57,037 [grpc-default-executor-5] INFO
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(658)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
> received a reply
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,037 [grpc-default-executor-5] INFO
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(674)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
> InstallSnapshot in progress.
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1]
> INFO server.RaftServer$Division
> (RaftServerImpl.java:checkInconsistentAppendEntries(1621)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: Failed appendEntries
> as snapshot (0) installation is in progress
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1]
> INFO server.RaftServer$Division
> (RaftServerImpl.java:appendEntriesAsync(1575)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: appendEntries* reply
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#127987:FAIL-t2,INCONSISTENCY,nextIndex=0,followerCommit=-1,matchIndex=-1
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] WARN
> server.GrpcLogAppender (GrpcLogAppender.java:onNextImpl(528)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-AppendLogResponseHandler:
> received INCONSISTENCY reply with nextIndex 0, errorCount=1,
> request=AppendEntriesRequest:cid=127987,entriesCount=0
> 2024-05-28 15:25:57,038
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
> INFO server.GrpcLogAppender
> (GrpcLogAppender.java:notifyInstallSnapshot(799)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
> notifyInstallSnapshot with firstAvailable=(t:1, i:0), followerNextIndex=0
> 2024-05-28 15:25:57,038
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
> INFO server.GrpcLogAppender
> (GrpcLogAppender.java:notifyInstallSnapshot(807)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
> send b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:installSnapshot(92)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: receive
> installSnapshot:
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:installSnapshot(103)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: reply
> installSnapshot:
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO
> server.GrpcServerProtocolService
> (GrpcServerProtocolService.java:onCompleted(200)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT,
> lastRequest: b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1,
> i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO
> server.GrpcServerProtocolService
> (GrpcServerProtocolService.java:lambda$onCompleted$7(202)) -
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, lastReply:
> null
> 2024-05-28 15:25:57,038 [grpc-default-executor-0] INFO
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(658)) -
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
> received a reply
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]