slfan1989 opened a new pull request, #1427:
URL: https://github.com/apache/ratis/pull/1427
## What changes were proposed in this pull request?
This PR improves the diagnostics and reliability of
`InstallSnapshotNotificationTests.testInstallSnapshotDuringBootstrap` by adding
bounded retry logic and comprehensive cluster state logging when configuration
changes fail.
### Key changes:
1. **Added setConfigurationWithBoundedRetry() method:**
- Limits `setConfiguration` retries to 30 attempts (vs unlimited retries
previously)
- Adds 1-second sleep between retry attempts
- Logs each attempt with leader ID and target configuration
- Dumps detailed cluster state on failure
2. **Added waitAndCheckNewConfWithDiagnostics() method:**
- Wraps `RaftServerTestUtil.waitAndCheckNewConf()` with exception handling
- Triggers cluster state dump on any assertion or exception
3. **Added dumpClusterState() diagnostic method:**
- Logs comprehensive cluster information including:
- Snapshot request/notification counts
- Leader snapshot info
- Per-division state (role, leader, term, indices, configuration)
- Follower next/match indices
- All server logs
4. **Updated `testInstallSnapshotDuringBootstrap():**
- Replaced direct `cluster.setConfiguration()` call with bounded retry
version
- Replaced configuration check with diagnostic version
## What is the link to the Apache JIRA
JIRA: RATIS-2501. Improve diagnostics for testInstallSnapshotDuringBootstrap
timeout failures.
Please replace this section with the link to the Apache JIRA)
## How was this patch tested?
- Existing unit tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]