[ 
https://issues.apache.org/jira/browse/RATIS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-2291.
-------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

The pull request is now merged.  Thanks, [~slfan1989]!

> Fix failing 
> TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot
> ---------------------------------------------------------------------------------
>
>                 Key: RATIS-2291
>                 URL: https://issues.apache.org/jira/browse/RATIS-2291
>             Project: Ratis
>          Issue Type: Bug
>          Components: test
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>             Fix For: 3.2.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During the investigation of RATIS-2251, we encountered a persistent unit test 
> failure in 
> TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot. 
> Initially, we suspected this was caused by the JUnit version upgrade, but 
> further analysis confirms that the test also fails under JUnit 4. Detailed 
> discussions and debugging steps can be found in the comments of PR #1227.
>  
> When addressing the issue with the unit test : 
> {{TestInstallSnapshotNotificationWithGrpc#testAddNewFollowersNoSnapshot}}
> I found that the error persists even after applying the fix from RATIS-2045.
> RATIS-2045 fixed the issue where SnapshotInstallationHandler didn't notify 
> followers to install snapshots when the snapshot index was -1 and the 
> leader's firstAvailableLogIndex was 0 (PR 
> [#1053|https://github.com/apache/ratis/pull/1053]).
> This PR changes the behavior of whether followers pull snapshots from the 
> leader.
> From the logs, we can observe that the newly added followers {{s1}} and 
> {{s2}} have both synchronized snapshots from the leader {{{}s0{}}}. As a 
> result, the snapshot index for followers {{s1}} and {{s2}} becomes {{16}} (16 
> because we manually created messages twice), instead of {{{}-1{}}}. 
> Therefore, the current check condition is problematic.
>  
> {code:java}
> follower s1:
> 2025-05-10 17:23:10,134 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(262)) - 
> s1@group-F83BA0BDB609: Received notification to install snapshot at index 0
> 2025-05-10 17:23:10,137 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(297)) - 
> s1@group-F83BA0BDB609: notifyInstallSnapshot: nextIndex is 0 but the leader's 
> first available index is 0.
> ......
> 2025-05-10 17:23:11,151 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(365)) - 
> s1@group-F83BA0BDB609: InstallSnapshot notification result: 
> SNAPSHOT_INSTALLED, at index: (t:1, i:16)
> follower s2:
> 2025-05-10 17:23:11,214 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(262)) - 
> s2@group-F83BA0BDB609: Received notification to install snapshot at index 0
> 2025-05-10 17:23:11,214 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(297)) - 
> s2@group-F83BA0BDB609: notifyInstallSnapshot: nextIndex is 0 but the leader's 
> first available index is 0.
> ......
> 2025-05-10 17:23:12,217 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(365)) - 
> s2@group-F83BA0BDB609: InstallSnapshot notification result: 
> SNAPSHOT_INSTALLED, at index: (t:1, i:16) {code}
> Logs before applying RATIS-2045:
> {code:java}
> 2025-05-10 17:42:54,878 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(221)) - 
> s1@group-46FD094EFC86: Received notification to install snapshot at index 0
> 2025-05-10 17:42:54,878 [grpc-default-executor-0] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(230)) - 
> s1@group-46FD094EFC86: InstallSnapshot notification result: 
> ALREADY_INSTALLED, current snapshot index: -1
> .....
> 2025-05-10 17:42:54,880 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(221)) - 
> s2@group-46FD094EFC86: Received notification to install snapshot at index 0
> 2025-05-10 17:42:54,880 [grpc-default-executor-2] INFO  
> impl.SnapshotInstallationHandler 
> (SnapshotInstallationHandler.java:notifyStateMachineToInstallSnapshot(230)) - 
> s2@group-46FD094EFC86: InstallSnapshot notification result: 
> ALREADY_INSTALLED, current snapshot index: -1 {code}
> So, if we believe that #1053 is reasonable, we should modify the check 
> condition by adjusting the expected value to match the leader's value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to