SzyWilliam commented on PR #1091:
URL: https://github.com/apache/ratis/pull/1091#issuecomment-2109363882
@szetszwo, thanks for reviewing this!
The original `configurationManager` before the patch will produce another
`IllegalStateException` as follows:
```java
2024-05-14 13:46:03,722 [grpc-default-executor-8] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(123)) - s4: Failed
INSTALL_SNAPSHOT request cid=0, isHeartbeat? false
java.lang.IllegalStateException
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:35)
at
org.apache.ratis.server.impl.ConfigurationManager.addConfiguration(ConfigurationManager.java:68)
at
org.apache.ratis.server.impl.ServerState.setRaftConf(ServerState.java:382)
at
org.apache.ratis.server.impl.ServerState.setRaftConf(ServerState.java:375)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:138)
at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:97)
at
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1652)
at
org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:674)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:341)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:338)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.process(GrpcServerProtocolService.java:106)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:174)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
```
The event sequence that triggers the exception is as follows (a little bit
tricky):
1. Original raft group [A,B,C] with A as the leader.
2. Leader A receives setConfiguration [A,B,C,D,E] from client. This setConf
request will form a configuration change log L0 (old=[A,B,C], new=[A,B,C,D,E]).
3. Leader A replicates the snapshot and logs(up to L0, which is uncommitted)
to D, E.
4. D and E receive L0 and they both append this configuration to the
`configurationManager`.
5. B and C do not receive L0 so the L0 remains uncommitted.
6. Unfortunately, leader A crashes, this client request setConfiguration
also fails.
7. B is elected as the new leader. The first thing B comes to power is to
replicate a group configuration log L1 (old=[A,B,C], new=[A,B,C]) to establish
its leader authority. NOTICE this log L1 shares the same index as L0 (the first
uncommited index).
8. Client retries the setConfiguration [A,B,C,D,E]. B tries to append log L1
to D and E. Since D and E's `configurationManager` already contain a
configuration at index L0, this new configuration L1 will be rejected and the
`IllegalStateException` will be thrown.
So back to the question, I think we should not allow two confs with the same
**committed** index. For the uncommitted confs, we should force it to be
aligned with the current leader (just as Raft Algo will truncates those
conflicting local uncommitted logs). What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]