Xushaohong opened a new pull request, #745:
URL: https://github.com/apache/ratis/pull/745

   ## What changes were proposed in this pull request?
   
   In the cluster, we found that there are many exceptions on Failed 
appendEntries.
   
   Unexpected Index: previous is null but entries[0].getIndex()=3402406. So the 
follower cannot append an entry due to some limit.
   
   The scenario:
   
   Follower B restarted, leader A sent the entry and B can not find the 
previous log entry. A sent the notifyInstallSnapshot request to B, and B found 
its next index is larger than the leader's firstAvailableLogIndex(the index to 
install snapshot). A updated B's index according to the reply and sent the 
entries to B.  A will find the previous entry TermIndex through 
``getPrevious(long nextIndex)``, if nextIndex of raft log of B is exactly the 
same as startIndex of leader A (B needs the entries since A's 
firstAvailableLogIndex), A has purged its raft log and will check the snapshot 
Index through server.getStateMachine().getLatestSnapshot() whether it equals to 
nextIndex - 1, if not then returns null. 
   
   The reason:
   The problem is due to the uncertainty of the purging raft log. If A has also 
been stopped, and thus triggered the takeSnapshot, the raft log may not be 
purged up to the snapshot index. The latest snapshot index from SM is not equal 
to the raft log's first available index, which leads to this corner case.
   
   We could add a case check for this when getPrevious.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1707
   
   ## How was this patch tested?
   /
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to