Kevin Liu created RATIS-2137:
--------------------------------

             Summary: Leader fails to send correct index to follower after 
timeout exception
                 Key: RATIS-2137
                 URL: https://issues.apache.org/jira/browse/RATIS-2137
             Project: Ratis
          Issue Type: Bug
    Affects Versions: 2.5.1
            Reporter: Kevin Liu


I found that after the following log, the follower became unavailable

24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875) 
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division: 
1@group-47BEDE733167: inconsistency entries. 
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1

Here is what I found in the leader's log

24/08/11 09:03:10,130 WARN 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender: 
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1): 
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795876 -> 34795875
24/08/11 09:03:13,715 INFO 
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo: 
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally 
34795875 -> 34795874

I guess when the leader called appendEntries for the first time, the follower 
had been executed successfully, but the leader did not receive the follower's 
response and times out. When it resent, it found that it could not match the 
change and started to decreaseNextIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to