arunsarin85 opened a new pull request, #10024: URL: https://github.com/apache/ozone/pull/10024
## What changes were proposed in this pull request? Added two integration tests to TestOzoneManagerHASnapshot that verify SnapshotDeletingService (SDS) behaves correctly when an OM leader failover happens while snapshot cleanup is pending. testSnapshotDeletingServiceDuringOMFailover Simulates SDS being blocked on the old leader (via suspend()) while a snapshot is queued for deletion. Triggers a leader failover by shutting down the old leader. Verifies that the new leader's SDS independently picks up the pending SNAPSHOT_DELETED entry, purges it from the DB, and leaves the snapshot chain in a consistent state. testSnapshotDeletingServiceWithMultipleSnapshotsDuringFailover Extends the above scenario to 3 snapshots queued for deletion simultaneously before the failover. Verifies the new leader's SDS correctly processes the full backlog and that chain integrity holds after all cleanups complete. Please describe your PR in detail: SnapshotDeletingService (SDS) is a background service on the OM leader responsible for cleaning up deleted snapshots.Two @Test methods are added: 1. testSnapshotDeletingServiceDuringOMFailover - Creates 5 keys and one snapshot - Suspends SDS on the current leader (simulates SDS blocked mid-run) - Deletes the snapshot, waits for it to reach SNAPSHOT_DELETED state in DB - Shuts down the old leader → forces election of a genuinely different new leader (cluster has 3 OMs, quorum=2) - Waits for the new leader's SDS to purge the snapshot from snapshotInfoTable - Asserts snapshot chain is not corrupted - finally block restores the 3-node cluster by restarting the old OM 2. testSnapshotDeletingServiceWithMultipleSnapshotsDuringFailover - Creates 3 snapshots with distinct keys captured in each - Suspends SDS on the current leader - Deletes all 3 snapshots, waits for all to reach SNAPSHOT_DELETED - Shuts down the old leader, waits for new leader election - Waits for the new leader's SDS to purge all 3 snapshots - Asserts snapshot chain integrity ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-8703 ## How was this patch tested? The two new tests were run locally against a 3-node MiniOzoneHAClusterImpl: mvn test -pl hadoop-ozone/integration-test -am \ -Dtest="TestOzoneManagerHASnapshot#testSnapshotDeletingServiceDuringOMFailover+testSnapshotDeletingServiceWithMultipleSnapshotsDuringFailover" \ -DfailIfNoTests=false Result: Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.88 s BUILD SUCCESS <img width="909" height="801" alt="image" src="https://github.com/user-attachments/assets/784a6b99-894e-499f-9b5f-7a62648a7973" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
