Xu Shao Hong created RATIS-1707:
-----------------------------------
Summary: Fix corner case when getPrevious in LogAppenderBase
Key: RATIS-1707
URL: https://issues.apache.org/jira/browse/RATIS-1707
Project: Ratis
Issue Type: Bug
Reporter: Xu Shao Hong
Assignee: Xu Shao Hong
In the cluster, we found that there are many exceptions on {*}Failed
appendEntries{*}.
_Unexpected Index: previous is null but entries[0].getIndex()=3402406. So the
follower cannot append an entry due to some limit._
The scenario:
Follower B restarted, leader A sent the entry and B can not find the previous
log entry. A sent the notifyInstallSnapshot request to B, and B found its next
index is larger than the leader's firstAvailableLogIndex(the index to install
snapshot). A updated B's index according to the reply and sent the entries to
B. A will find the previous entry TermIndex through ``getPrevious(long
nextIndex)``, if nextIndex of raft log of B is exactly the same as startIndex
of leader A (B needs the entries since A's firstAvailableLogIndex), A has
purged its raft log and will check the snapshot Index through
+server.getStateMachine().getLatestSnapshot()+ whether it equals to nextIndex -
1, if not then returns null.
The problem is due to the uncertainty of purging raft log. If A has also been
stopped, and thus triggered the takeSnapshot, the raft log may not be purged up
to the snapshot index. The latest snapshot index from SM is not equal to the
raft log's first available index, which leads to this corner case.
We could add a case check when getPrevious.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)