[
https://issues.apache.org/jira/browse/RATIS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze resolved RATIS-2291.
-------------------------------
Fix Version/s: 3.2.0
Resolution: Fixed
The pull request is now merged. Thanks, [~slfan1989]!
> Fix failing
> TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot
> ---------------------------------------------------------------------------------
>
> Key: RATIS-2291
> URL: https://issues.apache.org/jira/browse/RATIS-2291
> Project: Ratis
> Issue Type: Bug
> Components: test
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Major
> Fix For: 3.2.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> During the investigation of RATIS-2251, we encountered a persistent unit test
> failure in
> TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot.
> Initially, we suspected this was caused by the JUnit version upgrade, but
> further analysis confirms that the test also fails under JUnit 4. Detailed
> discussions and debugging steps can be found in the comments of PR #1227.
>
> When addressing the issue with the unit test :
> {{TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot}}
> I found that the error persists even after applying the fix from RATIS-2045.
> RATIS-2045 fixed the issue where SnapshotInstallationHandler didn't notify
> followers to install snapshots when the snapshot index was -1 and the
> leader's firstAvailableLogIndex was 0 (PR
> [#1053|https://github.com/apache/ratis/pull/1053]).
> This PR changes the behavior of whether followers pull snapshots from the
> leader.
> From the logs, we can observe that the newly added followers {{s1}} and
> {{s2}} have both synchronized snapshots from the leader {{{}s0{}}}. As a
> result, the snapshot index for followers {{s1}} and {{s2}} becomes {{16}} (16
> because we manually created messages twice), instead of {{{}-1{}}}.
> Therefore, the current check condition is problematic.
>
> {code:java}
> follower s1:
> 2025-05-10 17:23:10,134 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(262)) -
> s1@group-F83BA0BDB609: Received notification to install snapshot at index 0
> 2025-05-10 17:23:10,137 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(297)) -
> s1@group-F83BA0BDB609: notifyInstallSnapshot: nextIndex is 0 but the leader's
> first available index is 0.
> ......
> 2025-05-10 17:23:11,151 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(365)) -
> s1@group-F83BA0BDB609: InstallSnapshot notification result:
> SNAPSHOT_INSTALLED, at index: (t:1, i:16)
> follower s2:
> 2025-05-10 17:23:11,214 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(262)) -
> s2@group-F83BA0BDB609: Received notification to install snapshot at index 0
> 2025-05-10 17:23:11,214 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(297)) -
> s2@group-F83BA0BDB609: notifyInstallSnapshot: nextIndex is 0 but the leader's
> first available index is 0.
> ......
> 2025-05-10 17:23:12,217 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(365)) -
> s2@group-F83BA0BDB609: InstallSnapshot notification result:
> SNAPSHOT_INSTALLED, at index: (t:1, i:16) {code}
> Logs before applying RATIS-2045:
> {code:java}
> 2025-05-10 17:42:54,878 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(221)) -
> s1@group-46FD094EFC86: Received notification to install snapshot at index 0
> 2025-05-10 17:42:54,878 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(230)) -
> s1@group-46FD094EFC86: InstallSnapshot notification result:
> ALREADY_INSTALLED, current snapshot index: -1
> .....
> 2025-05-10 17:42:54,880 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(221)) -
> s2@group-46FD094EFC86: Received notification to install snapshot at index 0
> 2025-05-10 17:42:54,880 [grpc-default-executor-2] INFO
> impl.SnapshotInstallationHandler
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(230)) -
> s2@group-46FD094EFC86: InstallSnapshot notification result:
> ALREADY_INSTALLED, current snapshot index: -1 {code}
> So, if we believe that #1053 is reasonable, we should modify the check
> condition by adjusting the expected value to match the leader's value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)