lhotari opened a new pull request, #26015:
URL: https://github.com/apache/pulsar/pull/26015

   Main Issue: #22736
   
   Related: #24148
   
   ### Motivation
   
   `MockedPulsarServiceBaseTest.deleteNamespaceWithRetry` times out 
sporadically in CI (#22736). Investigation of a recent occurrence 
([TopicPoliciesTest.setupTestTopic in this 
run](https://github.com/apache/pulsar/actions/runs/27349826959/job/81141262202))
 showed the following chain in the test-report logs:
   
   1. A test sets a namespace-level `SubscribeRate` (1 subscribe per consumer 
per period).
   2. A compaction of the namespace's `__change_events` topic starts. Phase two 
of the compaction seeks the `__compaction` subscription, which disconnects and 
re-subscribes the compactor's reader.
   3. The re-subscribe is denied with `Subscribe limited by subscribe rate 
limit per consumer.` — the limiter's token bucket is per consumer identifier, 
and the reader's initial subscribe already consumed the only token. The reader 
retries in an exponential-backoff loop and the compaction stalls.
   4. A subsequent forced namespace deletion blocks in 
`PersistentTopic.asyncDeleteCursorWithCleanCompactionLedger()`, which waits for 
the in-flight compaction to complete (the mechanism described in #24148), until 
the test times out.
   
   This is not only a test problem: a user-configured subscribe rate 
(namespace, topic, or broker level) can stall compaction on any topic in 
production and consequently block forced topic/namespace deletion. Throttling 
broker-internal readers on system topics such as `__change_events` can also 
stall topic policy updates.
   
   Note that this addresses one root cause of #22736 only; the underlying 
deletion-vs-compaction deadlock analyzed in #24148 (which can also be reached 
without any subscribe rate) is a separate issue and is not changed here.
   
   ### Modifications
   
   - `PersistentTopic#internalSubscribe`: skip the subscribe rate limit check 
for the broker-internal `__compaction` subscription and for system topics. This 
is consistent with the existing system-topic exemptions for the publish and 
dispatch rate limiters (`SystemTopic#getBrokerPublishRateLimiter`, 
`AbstractTopic#updateTopicPolicyByNamespacePolicy`).
   - Added `CompactionTest.testCompactionNotBlockedBySubscribeRateLimit`, which 
sets a `SubscribeRate(1, 3600)` on the namespace and verifies that a compaction 
completes.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
   
   - `CompactionTest.testCompactionNotBlockedBySubscribeRateLimit` reproduces 
the stall: without the fix, the compaction does not complete within 30 seconds 
(timed out deterministically); with the fix it completes immediately (verified 
locally with `invocationCount = 10`, 10/10 passes).
   - `TopicPoliciesTest` subscribe-rate tests (`testGetSetSubscribeRate`, 
`testDisableSubscribeRate`, `testRemoveSubscribeRate`) still pass, confirming 
regular consumers remain throttled.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to