Hello everyone,

I was exploring the snapshot restore capability of Ratis and found one
scenario that failed.

1. Start a 3 Node ratis cluster and perform some updates to the state
machine.
2. Take the snapshot - the snapshot will be of the format term_index. Here
the term will initially be 1, and let's assume the index is at 10.
3. Kill the leader, the term would have increased to 2.
4. Perform some updates and trigger another snapshot. Let's assume the
index is at 20 and term is at 2.
5. Stop all nodes.
6. A failure is observed while starting the node.

```
Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI = (t:2, i:20)
```

Based on the error logs, I suspect the state machine updated the last
applied term index to t:2, i:20, but the ServerState has a separate
variable for tracking the currentTerm which is initialized to 0 at startup.
Once the leader is elected, it tried to update the log entry but the update
failed due to precondition check.

What's the correct way to solve this problem? Should the term be reset to 0
while loading the snapshot at the server startup?

References:
https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82
https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138

Thank you for looking into this issue.


Regards,
Snehasish

Reply via email to