Kevin Liu created RATIS-2137:
--------------------------------
Summary: Leader fails to send correct index to follower after
timeout exception
Key: RATIS-2137
URL: https://issues.apache.org/jira/browse/RATIS-2137
Project: Ratis
Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Kevin Liu
I found that after the following log, the follower became unavailable
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division:
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795876)
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,714 INFO [nioEventLoopGroup-3-3] RaftServer$Division:
1@group-47BEDE733167: inconsistency entries.
Reply:3<-1#2559343:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division:
1@group-47BEDE733167: Failed appendEntries as the first entry (index 34795875)
already exists (snapshotIndex: 34670809, commitIndex: 34795893)
24/08/11 09:03:13,715 INFO [nioEventLoopGroup-3-3] RaftServer$Division:
1@group-47BEDE733167: inconsistency entries.
Reply:3<-1#2559406:FAIL-t59,INCONSISTENCY,nextIndex=34795894,followerCommit=34795893,matchIndex=-1
Here is what I found in the leader's log
24/08/11 09:03:10,130 WARN
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] LogAppender:
3@group-47BEDE733167->1-LogAppenderDefault: Failed to appendEntries (retry=1):
org.apache.ratis.protocol.exceptions.TimeoutIOException
24/08/11 09:03:13,714 INFO
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo:
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally
34795876 -> 34795875
24/08/11 09:03:13,715 INFO
[3@group-47BEDE733167->1-LogAppenderDefault-LogAppenderDaemon] FollowerInfo:
3@group-47BEDE733167->1: decreaseNextIndex nextIndex: updateUnconditionally
34795875 -> 34795874
I guess when the leader called appendEntries for the first time, the follower
had been executed successfully, but the leader did not receive the follower's
response and times out. When it resent, it found that it could not match the
change and started to decreaseNextIndex.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)