lhotari opened a new pull request, #26015: URL: https://github.com/apache/pulsar/pull/26015
Main Issue: #22736 Related: #24148 ### Motivation `MockedPulsarServiceBaseTest.deleteNamespaceWithRetry` times out sporadically in CI (#22736). Investigation of a recent occurrence ([TopicPoliciesTest.setupTestTopic in this run](https://github.com/apache/pulsar/actions/runs/27349826959/job/81141262202)) showed the following chain in the test-report logs: 1. A test sets a namespace-level `SubscribeRate` (1 subscribe per consumer per period). 2. A compaction of the namespace's `__change_events` topic starts. Phase two of the compaction seeks the `__compaction` subscription, which disconnects and re-subscribes the compactor's reader. 3. The re-subscribe is denied with `Subscribe limited by subscribe rate limit per consumer.` — the limiter's token bucket is per consumer identifier, and the reader's initial subscribe already consumed the only token. The reader retries in an exponential-backoff loop and the compaction stalls. 4. A subsequent forced namespace deletion blocks in `PersistentTopic.asyncDeleteCursorWithCleanCompactionLedger()`, which waits for the in-flight compaction to complete (the mechanism described in #24148), until the test times out. This is not only a test problem: a user-configured subscribe rate (namespace, topic, or broker level) can stall compaction on any topic in production and consequently block forced topic/namespace deletion. Throttling broker-internal readers on system topics such as `__change_events` can also stall topic policy updates. Note that this addresses one root cause of #22736 only; the underlying deletion-vs-compaction deadlock analyzed in #24148 (which can also be reached without any subscribe rate) is a separate issue and is not changed here. ### Modifications - `PersistentTopic#internalSubscribe`: skip the subscribe rate limit check for the broker-internal `__compaction` subscription and for system topics. This is consistent with the existing system-topic exemptions for the publish and dispatch rate limiters (`SystemTopic#getBrokerPublishRateLimiter`, `AbstractTopic#updateTopicPolicyByNamespacePolicy`). - Added `CompactionTest.testCompactionNotBlockedBySubscribeRateLimit`, which sets a `SubscribeRate(1, 3600)` on the namespace and verifies that a compaction completes. ### Verifying this change - [x] Make sure that the change passes the CI checks. This change added tests and can be verified as follows: - `CompactionTest.testCompactionNotBlockedBySubscribeRateLimit` reproduces the stall: without the fix, the compaction does not complete within 30 seconds (timed out deterministically); with the fix it completes immediately (verified locally with `invocationCount = 10`, 10/10 passes). - `TopicPoliciesTest` subscribe-rate tests (`testGetSetSubscribeRate`, `testDisableSubscribeRate`, `testRemoveSubscribeRate`) still pass, confirming regular consumers remain throttled. ### Does this pull request potentially affect one of the following parts: *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
