lhotari opened a new pull request, #26016: URL: https://github.com/apache/pulsar/pull/26016
Fixes #24148 Related: #22736 ### Motivation Forced topic/namespace deletion can deadlock with an in-flight compaction, as analyzed in #24148: `PersistentTopic.asyncDeleteCursorWithCleanCompactionLedger()` waits for `currentCompaction` while the compactor cannot make progress on the topic being deleted. PR #24366 addressed the infinite wait: the compactor's `RawReader` is created with `retryOnRecoverableErrors=false`, so when the deletion fences the topic, the reader's reconnect fails with `ServiceNotReady` (`TopicFencedException` maps to it), the reader closes, and `currentCompaction` completes exceptionally. However, the deletion still fails every time after that: - `asyncDeleteCursorWithCleanCompactionLedger()` chains the compaction cursor deletion with `currentCompaction.thenCompose(...)`. When `currentCompaction` completed exceptionally, the cursor deletion is skipped and the unsubscribe fails with the compaction's exception instead. - The failed `currentCompaction` future is never reset (`triggerCompaction` is blocked while the topic is deleting), so **every subsequent deletion attempt fails immediately** until the topic instance is reloaded. `deleteNamespaceWithRetry` in tests retries into this poisoned state until it times out, producing the flaky-test failures tracked in #22736. #24366 effectively turned the deadlock into a permanent failure loop. Its regression test did not catch this because `testConcurrentCompactionAndTopicDelete` only asserts `deleteTopicFuture.isDone()`, which is also true when the deletion fails. ### Modifications - `PersistentTopic.asyncDeleteCursorWithCleanCompactionLedger()`: proceed with the compaction cursor deletion once `currentCompaction` has finished, regardless of whether it completed normally or exceptionally. Only an in-flight compaction is awaited. A compaction terminated by the deletion's fencing (per #24366) no longer fails the deletion. - Strengthened `testConcurrentCompactionAndTopicDelete` to assert that the topic deletion succeeded (`deleteTopicFuture.get(...)`) instead of only asserting completion. Without the fix, this now fails with the terminated compaction's `CancellationException`. - Added `CompactionTest.testForcedDeleteSucceedsAfterFailedCompaction`: deterministically fails a compaction (injection after the phase-two seek), then verifies a forced topic deletion succeeds. Without the fix, the deletion fails with the previous compaction's exception. - Added `CompactionTest.testForcedNamespaceDeleteWithInflightCompaction`: end-to-end reproduction of #24148 — forced namespace deletion while a compaction is in-flight (kept in-flight by 2000 one-message-at-a-time reads via `dispatcherMaxReadBatchSize=1`). Without the fix, it fails with the `ConditionTimeoutException` in `deleteNamespaceWithRetry` reported in #22736. Enabled `forceDeleteNamespaceAllowed` in the test config for this. ### Verifying this change - [x] Make sure that the change passes the CI checks. This change added tests and can be verified as follows: - All three tests above were verified red without the fix and green with the fix. - The full `CompactionTest` class passes locally. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
