[jira] [Updated] (RATIS-1707) Fix corner case when getPrevious in LogAppenderBase

Tsz-wo Sze (Jira) Thu, 22 Sep 2022 02:15:05 -0700


     [ 
https://issues.apache.org/jira/browse/RATIS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tsz-wo Sze updated RATIS-1707:
------------------------------
    Component/s: server

> Fix corner case when getPrevious in LogAppenderBase
> ---------------------------------------------------
>
>                 Key: RATIS-1707
>                 URL: https://issues.apache.org/jira/browse/RATIS-1707
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Xu Shao Hong
>            Assignee: Xu Shao Hong
>            Priority: Major
>         Attachments: 745_review.patch, ratis.png
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In the cluster, we found that there are many exceptions on {*}Failed 
> appendEntries{*}.
> _Unexpected Index: previous is null but entries[0].getIndex()=3402406. So the 
> follower cannot append an entry due to some limit._
> The scenario:
> Follower B restarted, leader A sent the entry and B can not find the previous 
> log entry. A sent the notifyInstallSnapshot request to B, and B found its 
> next index is larger than the leader's firstAvailableLogIndex(the index to 
> install snapshot). A updated B's index according to the reply and sent the 
> entries to B.  A will find the previous entry TermIndex through 
> ``getPrevious(long nextIndex)``, if nextIndex of raft log of B is exactly the 
> same as startIndex of leader A (B needs the entries since A's 
> firstAvailableLogIndex), A has purged its raft log and will check the 
> snapshot Index through +server.getStateMachine().getLatestSnapshot()+ whether 
> it equals to nextIndex - 1, if not then returns null. 
> The problem is due to the uncertainty of purging raft log. If A has also been 
> stopped, and thus triggered the takeSnapshot, the raft log may not be purged 
> up to the snapshot index. The latest snapshot index from SM is not equal to 
> the raft log's first available index, which leads to this corner case.
> We could add a case check when getPrevious.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RATIS-1707) Fix corner case when getPrevious in LogAppenderBase

Reply via email to