Xushaohong opened a new pull request, #745: URL: https://github.com/apache/ratis/pull/745
## What changes were proposed in this pull request? In the cluster, we found that there are many exceptions on Failed appendEntries. Unexpected Index: previous is null but entries[0].getIndex()=3402406. So the follower cannot append an entry due to some limit. The scenario: Follower B restarted, leader A sent the entry and B can not find the previous log entry. A sent the notifyInstallSnapshot request to B, and B found its next index is larger than the leader's firstAvailableLogIndex(the index to install snapshot). A updated B's index according to the reply and sent the entries to B. A will find the previous entry TermIndex through ``getPrevious(long nextIndex)``, if nextIndex of raft log of B is exactly the same as startIndex of leader A (B needs the entries since A's firstAvailableLogIndex), A has purged its raft log and will check the snapshot Index through server.getStateMachine().getLatestSnapshot() whether it equals to nextIndex - 1, if not then returns null. The reason: The problem is due to the uncertainty of the purging raft log. If A has also been stopped, and thus triggered the takeSnapshot, the raft log may not be purged up to the snapshot index. The latest snapshot index from SM is not equal to the raft log's first available index, which leads to this corner case. We could add a case check for this when getPrevious. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/RATIS-1707 ## How was this patch tested? / -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
