[ 
https://issues.apache.org/jira/browse/RATIS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-982:
-----------------------------
    Description: 
This happens in test, but it maybe also happen in production.

For example, leader is s3 and follower is s4.
1. kill s4, and restart s4.

{code:java}
2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4
2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - 
newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, 
s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
{code}

2. s4 start and set configuration from storage at 
[setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170]
 and s4 will change to RUNNING at 
[lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213]

2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  
impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - 
s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, 
s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], 
old=null at 0

3. s3 send append entry request to s4, and s4 change to RUNNING at 
[lifeCycle.compareAndTransition(STARTING, 
RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003]

4. If change to RUNNING in step3 happens before step2, then step2 will throw 
exception.


  was:
This happens in test, but it maybe also happen in production.

For example, leader is s3 and follower is s4.
1. kill s4, and restart s4.

{code:java}
2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4
2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - 
newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, 
s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
{code}

2. s4 set configuration from storage at 
setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  
impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - 
s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, 
s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], 
old=null at 0




> RaftServerImpl failed to change from RUNNING to RUNNING
> -------------------------------------------------------
>
>                 Key: RATIS-982
>                 URL: https://issues.apache.org/jira/browse/RATIS-982
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>
> This happens in test, but it maybe also happen in production.
> For example, leader is s3 and follower is s4.
> 1. kill s4, and restart s4.
> {code:java}
> 2020-06-19T07:03:18.1000860Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
> ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4
> 2020-06-19T07:03:18.1001826Z 2020-06-19 07:03:18,095 [Thread-6194] INFO  
> ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - 
> newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, 
> s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
> {code}
> 2. s4 start and set configuration from storage at 
> [setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
> |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170]
>  and s4 will change to RUNNING at 
> [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213]
> 2020-06-19T07:03:18.1345896Z 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  
> impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - 
> s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, 
> s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], 
> old=null at 0
> 3. s3 send append entry request to s4, and s4 change to RUNNING at 
> [lifeCycle.compareAndTransition(STARTING, 
> RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003]
> 4. If change to RUNNING in step3 happens before step2, then step2 will throw 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to