[ https://issues.apache.org/jira/browse/RATIS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang updated RATIS-982: ----------------------------- Description: This happens in test, but it maybe also happen in production. For example, leader is s3 and follower is s4. 1. kill s4, and restart s4. {code:java} 2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4 2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false {code} 2. s4 start and set configuration from storage at [setRaftConf(raftConf.getLogEntryIndex(), raftConf) |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170] and s4 will change to RUNNING at [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213] 2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0 3. s3 send append entry request to s4, and s4 change to RUNNING at [lifeCycle.compareAndTransition(STARTING, RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003] 4. If change to RUNNING in step3 happens before step2, then step2 will throw exception. was: This happens in test, but it maybe also happen in production. For example, leader is s3 and follower is s4. 1. kill s4, and restart s4. {code:java} 2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4 2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false {code} 2. s4 set configuration from storage at setRaftConf(raftConf.getLogEntryIndex(), raftConf) 2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0 > RaftServerImpl failed to change from RUNNING to RUNNING > ------------------------------------------------------- > > Key: RATIS-982 > URL: https://issues.apache.org/jira/browse/RATIS-982 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > > This happens in test, but it maybe also happen in production. > For example, leader is s3 and follower is s4. > 1. kill s4, and restart s4. > {code:java} > 2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO > ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4 > 2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO > ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - > newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, > s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false > {code} > 2. s4 start and set configuration from storage at > [setRaftConf(raftConf.getLogEntryIndex(), raftConf) > |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170] > and s4 will change to RUNNING at > [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213] > 2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO > impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - > s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, > s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], > old=null at 0 > 3. s3 send append entry request to s4, and s4 change to RUNNING at > [lifeCycle.compareAndTransition(STARTING, > RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003] > 4. If change to RUNNING in step3 happens before step2, then step2 will throw > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)