Re: Issue in Ratis server startup after loading snapshot

Tsz Wo Sze Mon, 09 Mar 2026 12:13:46 -0700

> ... but as each file will be backed up at a different time, it can create
subtle inconsistencies.


Is it possible to stop the server, backup and then restart the server?
This is the safest approach.

On the top of my head, it seems to me that it won't have subtle
inconsistencies:
 - Most of the files remain unchanged after format.
 - There is a metafile which will be updated atomically when there is a
leader change.
 - We may take a snapshot first and skip backing up the RaftLog.
Of course, we should test it carefully before making such a claim (i.e. no
subtle inconsistencies).

> Do you foresee any issue that can come because of this?

If setLastAppliedTermIndex() is not called, the state machine will be asked
to apply from a lower index.  The state won't be correct.

Tsz-Wo


On Mon, Mar 9, 2026 at 3:02 AM Snehasish Roy <[email protected]>
wrote:

> Hi Tsz-Wo,
>
> I was able to bypass this issue by not updating the termIndex in the
> loadSnapshot() method. Previously I was updating the last applied term
> index after updating the state machine in the loadSnapshot() method by
> calling setLastAppliedTermIndex().
>
> Do you foresee any issue that can come because of this?
>
>
> Regards,
> Snehasish
>
> On Mon, 9 Mar 2026 at 08:33, Snehasish Roy <[email protected]>
> wrote:
>
> > Hi Tsz-Wo,
> >
> > Thank you for your prompt response. Backing up the entire directory is
> > feasible but as each file will be backed up at a different time, it can
> > create subtle inconsistencies.
> > Could you please guide me a bit on how to recreate the metadata from the
> > snapshot?
> >
> > I am sure it would be helpful for the others looking to backup their
> Ratis
> > State Machine.
> >
> >
> > Regards,
> > Snehasish
> >
> >
> > On Sat, 7 Mar 2026 at 02:30, Tsz Wo Sze <[email protected]> wrote:
> >
> >> Hi Snehasish,
> >>
> >> I see your requirement now.  For backup and restore, one easy way is to
> >> backup the entire Ratis storage directory, not just the snapshot.
> >> Although
> >> it is possible to recreate the other Ratis metadata from a snapshot, you
> >> need to understand a great deal of Ratis in order to do so.  Just
> copying
> >> a
> >> snapshot won't work since it also needs other Ratis metadata.  Would it
> >> work for you to just backup the entire directory?
> >>
> >> Tsz-Wo
> >>
> >>
> >> On Fri, Mar 6, 2026 at 2:23 AM Snehasish Roy <[email protected]>
> >> wrote:
> >>
> >> > Addendum to the above email, I understand that the S3 snapshots can be
> >> > stale but as all the nodes are gone, I don't have a way to get the
> >> latest
> >> > data, I just need a way to restore from the last known good
> checkpoint.
> >> If
> >> > the majority of the nodes are still available, followers can easily
> get
> >> the
> >> > data from the leader and build the state machine.
> >> >
> >> >
> >> > Regards,
> >> > Snehasish
> >> >
> >> > On Fri, 6 Mar 2026 at 15:31, Snehasish Roy <[email protected]>
> >> > wrote:
> >> >
> >> > > Hi Tsz Wo,
> >> > >
> >> > > Thank you again for the prompt response. Kindly let me take a step
> >> back
> >> > > and explain what I am trying to solve.
> >> > > I want to ensure durability of the State Machine in case all the
> >> nodes go
> >> > > down.
> >> > >
> >> > > If I am running a 3 node Ratis Cluster and if all the nodes go down
> >> due
> >> > to
> >> > > some physical hardware failure, I need a way to ensure that when the
> >> new
> >> > > node spawns up it should be able to restore the state.
> >> > > To do so, I am thinking of taking periodic snapshots to a durable
> >> storage
> >> > > e.g. S3 and when a new node spawn (which should be handled by
> another
> >> > > service), it can pull the snapshot from S3 and restore the state.
> >> > >
> >> > > To simulate this scenario, I clean the storage directory of ratis
> >> nodes
> >> > > before starting it up so they don't have any previous state and let
> >> the
> >> > > nodes pull the snapshot from a separate directory.
> >> > > Please let me know if there is some other way I can solve this
> >> problem.
> >> > >
> >> > > Hope this helps.
> >> > >
> >> > > Regards,
> >> > > Snehasish
> >> > >
> >> > > On Thu, 5 Mar 2026 at 23:50, Tsz Wo Sze <[email protected]>
> wrote:
> >> > >
> >> > >> Hi Snehasish,
> >> > >>
> >> > >> > Once the snapshot is triggered, I move it to a different
> directory
> >> to
> >> > >> simulate clean restart.
> >> > >>
> >> > >> Is this step required to reproduce the failure?  If there is a
> >> snapshot
> >> > >> taken, the server expects that the snapshot is there and it may
> >> delete
> >> > the
> >> > >> raft logs for freeing up space.  If this step is required to
> >> reproduce
> >> > the
> >> > >> failure, it does not look like a bug.
> >> > >>
> >> > >> In general, we cannot manually move the Ratis metadata around.
> Just
> >> > like
> >> > >> that if we manually move some system files around in Linux or
> >> Windows,
> >> > the
> >> > >> system may not be able to restart.
> >> > >>
> >> > >> Tsz-Wo
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Thu, Mar 5, 2026 at 10:08 AM Tsz Wo Sze <[email protected]>
> >> wrote:
> >> > >>
> >> > >> > Hi Snehasish,
> >> > >> >
> >> > >> > Since you already have a test, could you share the code change?
> >> You
> >> > may
> >> > >> > attach a patch file or create a pull request.   I will run it to
> >> > >> reproduce
> >> > >> > the failure.
> >> > >> >
> >> > >> > In the meantime, I will try to understand the details you
> provided.
> >> > >> >
> >> > >> > Tsz-Wo
> >> > >> >
> >> > >> >
> >> > >> > On Thu, Mar 5, 2026 at 3:14 AM Snehasish Roy <
> >> > [email protected]>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Hi Tsz-Wo,
> >> > >> >>
> >> > >> >> Thank you for your prompt response. I was able to reproduce this
> >> > issue
> >> > >> >> using CounterStateMachine.
> >> > >> >>
> >> > >> >> I added an utility in the CounterClient to trigger a snapshot.
> >> > >> >>
> >> > >> >> ```
> >> > >> >> private void takeSnapshot() throws IOException {
> >> > >> >>     RaftClientReply raftClientReply =
> >> > client.getSnapshotManagementApi()
> >> > >> >>             .create(true, 30_000);
> >> > >> >>     System.out.println(raftClientReply);
> >> > >> >> }
> >> > >> >> ```
> >> > >> >>
> >> > >> >> Once the snapshot is triggered, I move it to a different
> >> directory to
> >> > >> >> simulate clean restart.
> >> > >> >>
> >> > >> >> I also updated the
> >> SimpleStateMachineStorage::loadLatestSnapshot() to
> >> > >> look
> >> > >> >> for snapshots in a different directory.
> >> > >> >>
> >> > >> >> ```
> >> > >> >> public SingleFileSnapshotInfo loadLatestSnapshot() {
> >> > >> >>     final File dir = new File("/tmp/snapshots");
> >> > >> >> }
> >> > >> >> ```
> >> > >> >>
> >> > >> >> Full steps for reproduction
> >> > >> >> 1. I started a 3 Node CounterServer and performed some updates
> to
> >> the
> >> > >> >> state
> >> > >> >> machine using the CounterClient.
> >> > >> >>
> >> > >> >> 2. Triggered the snapshot via the CounterClient and then moved
> the
> >> > >> >> snapshot
> >> > >> >> to a different directory - the snapshot will be of the format
> >> > >> term_index.
> >> > >> >> Here the term will initially be 1, and let's assume the index is
> >> at
> >> > 10.
> >> > >> >>
> >> > >> >> 3. Kill the leader, the term would have increased to 2.
> >> > >> >>
> >> > >> >> 4. Perform some updates and trigger another snapshot. Let's
> assume
> >> > the
> >> > >> >> index is at 20 and the term is at 2. Moved the snapshot to a
> >> > different
> >> > >> >> directory.
> >> > >> >>
> >> > >> >> 5. Stopped all nodes. Cleared all storage directories of all the
> >> > nodes
> >> > >> to
> >> > >> >> simulate clean restart.
> >> > >> >>
> >> > >> >> 6. Start 3 node CounterServer and observe the failure at the
> >> startup.
> >> > >> >>
> >> > >> >> ```
> >> > >> >> 026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest
> >> > >> snapshot
> >> > >> >> is
> >> > >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
> >> in
> >> > >> >> /tmp/snapshots
> >> > >> >> 2026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest
> >> > >> snapshot
> >> > >> >> is SingleFileSnapshotInfo(t:2,
> >> i:20):[/tmp/snapshots/snapshot.2_20]
> >> > in
> >> > >> >> /tmp/snapshots
> >> > >> >> 2026-03-05 15:48:56 INFO  RaftServerConfigKeys:62 -
> >> > >> >> raft.server.log.use.memory = false (default)
> >> > >> >> 2026-03-05 15:48:56 INFO  RaftServer$Division:155 -
> >> > >> n0@group-ABB3109A44C1
> >> > >> >> :
> >> > >> >> getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1)
> >> > returns
> >> > >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
> >> > >> >> 2026-03-05 15:48:56 INFO  RaftLog:90 -
> >> > >> >> n0@group-ABB3109A44C1-SegmentedRaftLog:
> >> > snapshotIndexFromStateMachine
> >> > >> =
> >> > >> >> 20
> >> > >> >> ....
> >> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:577 -
> >> > >> n1@group-ABB3109A44C1
> >> > >> >> :
> >> > >> >> set firstElectionSinceStartup to false for becomeLeader
> >> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:278 -
> >> > >> n1@group-ABB3109A44C1
> >> > >> >> :
> >> > >> >> change Leader from null to n1 at term 1 for becomeLeader, leader
> >> > >> elected
> >> > >> >> after 672ms
> >> > >> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:440 -
> >> > >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment
> >> from
> >> > >> >> index:21
> >> > >> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:647 -
> >> > >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log
> >> > segment
> >> > >> >>
> >> > >>
> >> >
> >>
> /ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21
> >> > >> >> ....
> >> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:309 - Leader
> >> > >> >> n1@group-ABB3109A44C1-LeaderStateImpl is ready since
> >> appliedIndex ==
> >> > >> >> startIndex == 21
> >> > >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> >> > >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> >> > >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> >> > >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> >> > >> >> java.lang.IllegalStateException: n1: Failed
> >> > updateLastAppliedTermIndex:
> >> > >> >> newTI = (t:1, i:21) < oldTI = (t:2, i:20)
> >> > >> >> at
> >> > >>
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255)
> >> > >> >> at
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
> >> > >> >> at java.base/java.lang.Thread.run(Thread.java:1575)
> >> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:528 -
> >> > >> n1@group-ABB3109A44C1
> >> > >> >> :
> >> > >> >> shutdown
> >> > >> >> ```
> >> > >> >>
> >> > >> >> As you can see from the stack trace, during the snapshot
> restore,
> >> the
> >> > >> >> termIndex was updated to the latest value seen from the snapshot
> >> > 2:20,
> >> > >> but
> >> > >> >> when the server was started from a clean slate, then the term
> was
> >> > >> reset to
> >> > >> >> 1 by the RaftServerImpl at the startup. It then tries to update
> >> the
> >> > log
> >> > >> >> entries and fails because of the precondition check that the
> term
> >> > >> should
> >> > >> >> be
> >> > >> >> monotonically increasing in the log entries.
> >> > >> >>
> >> > >> >> Please let me know if you need more information.
> >> > >> >>
> >> > >> >> Regards
> >> > >> >>
> >> > >> >> On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]>
> >> wrote:
> >> > >> >>
> >> > >> >> > Hi Snehasish,
> >> > >> >> >
> >> > >> >> > > ... newTI = (t:1, i:21) ...
> >> > >> >> >
> >> > >> >> > The newTI was invalid.  It probably was from the state
> >> machine.  It
> >> > >> >> should
> >> > >> >> > just use the TermIndex from LogEntryProto.  See
> >> > CounterStateMachine
> >> > >> >> [1] as
> >> > >> >> > an example.
> >> > >> >> >
> >> > >> >> > Tsz-Wo
> >> > >> >> > [1]
> >> > >> >> >
> >> > >> >> >
> >> > >> >>
> >> > >>
> >> >
> >>
> https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266
> >> > >> >> >
> >> > >> >> > On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev <
> >> > >> >> > [email protected]>
> >> > >> >> > wrote:
> >> > >> >> >
> >> > >> >> > > Hello everyone,
> >> > >> >> > >
> >> > >> >> > > I was exploring the snapshot restore capability of Ratis and
> >> > found
> >> > >> one
> >> > >> >> > > scenario that failed.
> >> > >> >> > >
> >> > >> >> > > 1. Start a 3 Node ratis cluster and perform some updates to
> >> the
> >> > >> state
> >> > >> >> > > machine.
> >> > >> >> > > 2. Take the snapshot - the snapshot will be of the format
> >> > >> term_index.
> >> > >> >> > Here
> >> > >> >> > > the term will initially be 1, and let's assume the index is
> at
> >> > 10.
> >> > >> >> > > 3. Kill the leader, the term would have increased to 2.
> >> > >> >> > > 4. Perform some updates and trigger another snapshot. Let's
> >> > assume
> >> > >> the
> >> > >> >> > > index is at 20 and term is at 2.
> >> > >> >> > > 5. Stop all nodes.
> >> > >> >> > > 6. A failure is observed while starting the node.
> >> > >> >> > >
> >> > >> >> > > ```
> >> > >> >> > > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) <
> >> oldTI =
> >> > >> (t:2,
> >> > >> >> > > i:20)
> >> > >> >> > > ```
> >> > >> >> > >
> >> > >> >> > > Based on the error logs, I suspect the state machine updated
> >> the
> >> > >> last
> >> > >> >> > > applied term index to t:2, i:20, but the ServerState has a
> >> > separate
> >> > >> >> > > variable for tracking the currentTerm which is initialized
> to
> >> 0
> >> > at
> >> > >> >> > startup.
> >> > >> >> > > Once the leader is elected, it tried to update the log entry
> >> but
> >> > >> the
> >> > >> >> > update
> >> > >> >> > > failed due to precondition check.
> >> > >> >> > >
> >> > >> >> > > What's the correct way to solve this problem? Should the
> term
> >> be
> >> > >> reset
> >> > >> >> > to 0
> >> > >> >> > > while loading the snapshot at the server startup?
> >> > >> >> > >
> >> > >> >> > > References:
> >> > >> >> > >
> >> > >> >> > >
> >> > >> >> >
> >> > >> >>
> >> > >>
> >> >
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82
> >> > >> >> > >
> >> > >> >> > >
> >> > >> >> >
> >> > >> >>
> >> > >>
> >> >
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138
> >> > >> >> > >
> >> > >> >> > > Thank you for looking into this issue.
> >> > >> >> > >
> >> > >> >> > >
> >> > >> >> > > Regards,
> >> > >> >> > > Snehasish
> >> > >> >> > >
> >> > >> >> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: Issue in Ratis server startup after loading snapshot

Reply via email to