Hello everyone, I was exploring the snapshot restore capability of Ratis and found one scenario that failed.
1. Start a 3 Node ratis cluster and perform some updates to the state machine. 2. Take the snapshot - the snapshot will be of the format term_index. Here the term will initially be 1, and let's assume the index is at 10. 3. Kill the leader, the term would have increased to 2. 4. Perform some updates and trigger another snapshot. Let's assume the index is at 20 and term is at 2. 5. Stop all nodes. 6. A failure is observed while starting the node. ``` Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI = (t:2, i:20) ``` Based on the error logs, I suspect the state machine updated the last applied term index to t:2, i:20, but the ServerState has a separate variable for tracking the currentTerm which is initialized to 0 at startup. Once the leader is elected, it tried to update the log entry but the update failed due to precondition check. What's the correct way to solve this problem? Should the term be reset to 0 while loading the snapshot at the server startup? References: https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82 https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138 Thank you for looking into this issue. Regards, Snehasish
