[ 
https://issues.apache.org/jira/browse/HDDS-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850212#comment-17850212
 ] 

Duong commented on HDDS-10924:
------------------------------

Looks like this is related to RATIS-2045.

When a follower SCM is added to a leader with an empty backlog, the leader asks 
the follower to install an (empty) snapshot. At this moment, 

scm.getSCMHANodeDetails().getPeerNodeDetails() is empty and 
SCMStateMachine.notifyInstallSnapshotFromLeader can't handle it.

In the previous ratis version, no snapshot installation was required, so the 
test passed before. 

> TestSCMHAManagerImpl#testAddSCM fails on ratis master
> -----------------------------------------------------
>
>                 Key: HDDS-10924
>                 URL: https://issues.apache.org/jira/browse/HDDS-10924
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Duong
>            Priority: Major
>
> TestSCMHAManagerImpl#testAddSCM fails on ratis master with the following 
> errors.
> {code:java}
> org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException: 
> 1c87d01a-bf32-4066-b42b-826ac82dfd8f@group-16EE0E876D68-ConfigurationStagingState:
>  Fail to set configuration peers:[follower|localhost:9898, 
> 1c87d01a-bf32-4066-b42b-826ac82dfd8f|localhost:9894]|listeners:[] due to 
> NOPROGRESS
> 1546  at 
> org.apache.ratis.server.impl.LeaderStateImpl$ConfigurationStagingState.fail(LeaderStateImpl.java:1262)
> 1547  at 
> org.apache.ratis.server.impl.LeaderStateImpl.checkStaging(LeaderStateImpl.java:842)
> 1548  at 
> org.apache.ratis.server.impl.LeaderStateImpl.access$700(LeaderStateImpl.java:101)
> 1549  at 
> org.apache.ratis.server.impl.LeaderStateImpl$EventProcessor.run(LeaderStateImpl.java:774)
> 1550 {code}
> Snapshot installation failed because of IllegalStateException.
> {code:java}
> 2024-05-28 19:54:26,832 [grpc-default-executor-0] ERROR 
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:installSnapshot(99)) - 
> 388e333f-7af0-4453-8dd3-61733cda944d@group-16EE0E876D68: installSnapshot 
> failed
> java.lang.IllegalStateException
>     at com.google.common.base.Preconditions.checkState(Preconditions.java:496)
>     at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyInstallSnapshotFromLeader(SCMStateMachine.java:239)
>     at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:267)
>     at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:128)
>     at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:97)
>     at 
> org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1652)
>     at 
> org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:675)
>     at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:349)
>     at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:346)
>     at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.process(GrpcServerProtocolService.java:106)
>     at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:174)
>     at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
>     at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
>     at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
>     at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
>     at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>     at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>     at java.base/java.lang.Thread.run(Thread.java:840) {code}
> Then the following log messages repeat forever.
> {code:java}
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(674)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
>  InstallSnapshot in progress.
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1] 
> INFO  server.RaftServer$Division 
> (RaftServerImpl.java:checkInconsistentAppendEntries(1621)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: Failed appendEntries 
> as snapshot (0) installation is in progress
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1] 
> INFO  server.RaftServer$Division 
> (RaftServerImpl.java:appendEntriesAsync(1575)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: appendEntries* reply 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#127986:FAIL-t2,INCONSISTENCY,nextIndex=0,followerCommit=-1,matchIndex=-1
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] WARN  
> server.GrpcLogAppender (GrpcLogAppender.java:onNextImpl(528)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-AppendLogResponseHandler:
>  received INCONSISTENCY reply with nextIndex 0, errorCount=1, 
> request=AppendEntriesRequest:cid=127986,entriesCount=0
> 2024-05-28 15:25:57,037 
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
>  INFO  server.GrpcLogAppender 
> (GrpcLogAppender.java:notifyInstallSnapshot(799)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
>  notifyInstallSnapshot with firstAvailable=(t:1, i:0), followerNextIndex=0
> 2024-05-28 15:25:57,037 
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
>  INFO  server.GrpcLogAppender 
> (GrpcLogAppender.java:notifyInstallSnapshot(807)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
>  send b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:installSnapshot(92)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: receive 
> installSnapshot: 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:installSnapshot(103)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: reply 
> installSnapshot: 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:onCompleted(200)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, 
> lastRequest: b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, 
> i:0)
> 2024-05-28 15:25:57,037 [grpc-default-executor-0] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:lambda$onCompleted$7(202)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, lastReply: 
> null
> 2024-05-28 15:25:57,037 [grpc-default-executor-5] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(658)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
>  received a reply 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,037 [grpc-default-executor-5] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(674)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
>  InstallSnapshot in progress.
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1] 
> INFO  server.RaftServer$Division 
> (RaftServerImpl.java:checkInconsistentAppendEntries(1621)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: Failed appendEntries 
> as snapshot (0) installation is in progress
> 2024-05-28 15:25:57,037 [51eedc82-c249-4ac2-bdcd-05a70b0f674c-server-thread1] 
> INFO  server.RaftServer$Division 
> (RaftServerImpl.java:appendEntriesAsync(1575)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: appendEntries* reply 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#127987:FAIL-t2,INCONSISTENCY,nextIndex=0,followerCommit=-1,matchIndex=-1
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] WARN  
> server.GrpcLogAppender (GrpcLogAppender.java:onNextImpl(528)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-AppendLogResponseHandler:
>  received INCONSISTENCY reply with nextIndex 0, errorCount=1, 
> request=AppendEntriesRequest:cid=127987,entriesCount=0
> 2024-05-28 15:25:57,038 
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
>  INFO  server.GrpcLogAppender 
> (GrpcLogAppender.java:notifyInstallSnapshot(799)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
>  notifyInstallSnapshot with firstAvailable=(t:1, i:0), followerNextIndex=0
> 2024-05-28 15:25:57,038 
> [b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender-LogAppenderDaemon]
>  INFO  server.GrpcLogAppender 
> (GrpcLogAppender.java:notifyInstallSnapshot(807)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-GrpcLogAppender:
>  send b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:installSnapshot(92)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: receive 
> installSnapshot: 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:installSnapshot(103)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c@group-A959FA14CAD1: reply 
> installSnapshot: 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:onCompleted(200)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, 
> lastRequest: b6dabbe9-8da8-4143-a83f-7f5b0eb07845->follower#0-t2,notify:(t:1, 
> i:0)
> 2024-05-28 15:25:57,038 [grpc-default-executor-5] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:lambda$onCompleted$7(202)) - 
> 51eedc82-c249-4ac2-bdcd-05a70b0f674c: Completed INSTALL_SNAPSHOT, lastReply: 
> null
> 2024-05-28 15:25:57,038 [grpc-default-executor-0] INFO  
> server.GrpcLogAppender (GrpcLogAppender.java:onNext(658)) - 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845@group-A959FA14CAD1->follower-InstallSnapshotResponseHandler:
>  received a reply 
> b6dabbe9-8da8-4143-a83f-7f5b0eb07845<-51eedc82-c249-4ac2-bdcd-05a70b0f674c#0:FAIL-t2,IN_PROGRESS,snapshotIndex=0
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to