Xu Shao Hong created RATIS-1707:
-----------------------------------

             Summary: Fix corner case when getPrevious in LogAppenderBase
                 Key: RATIS-1707
                 URL: https://issues.apache.org/jira/browse/RATIS-1707
             Project: Ratis
          Issue Type: Bug
            Reporter: Xu Shao Hong
            Assignee: Xu Shao Hong


In the cluster, we found that there are many exceptions on {*}Failed 
appendEntries{*}.

_Unexpected Index: previous is null but entries[0].getIndex()=3402406. So the 
follower cannot append an entry due to some limit._

The scenario:

Follower B restarted, leader A sent the entry and B can not find the previous 
log entry. A sent the notifyInstallSnapshot request to B, and B found its next 
index is larger than the leader's firstAvailableLogIndex(the index to install 
snapshot). A updated B's index according to the reply and sent the entries to 
B.  A will find the previous entry TermIndex through ``getPrevious(long 
nextIndex)``, if nextIndex of raft log of B is exactly the same as startIndex 
of leader A (B needs the entries since A's firstAvailableLogIndex), A has 
purged its raft log and will check the snapshot Index through 
+server.getStateMachine().getLatestSnapshot()+ whether it equals to nextIndex - 
1, if not then returns null. 

The problem is due to the uncertainty of purging raft log. If A has also been 
stopped, and thus triggered the takeSnapshot, the raft log may not be purged up 
to the snapshot index. The latest snapshot index from SM is not equal to the 
raft log's first available index, which leads to this corner case.

We could add a case check when getPrevious.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to