coderzc opened a new pull request, #26012:
URL: https://github.com/apache/pulsar/pull/26012

   Fixes #25996
   
   ### Motivation
   
   With `isDelayedDeliveryDeliverAtTimeStrict=true`, delayed messages can 
remain undelivered indefinitely past their `deliverAt` time while a consumer is 
blocked in `receive()`. The stalled messages are only released when an 
unrelated dispatch event happens (e.g. a new publish or a consumer reconnect); 
on a quiet topic the delay is unbounded. With the default `strict=false` the 
same traffic is delivered on time.
   
   Root cause is in `AbstractDelayedDeliveryTracker.updateTimer()`. Because 
delivery timestamps are trimmed for memory efficiency (up to ~511ms with the 
default `tickTimeMillis=1000`), `getScheduledMessages()` can pop a message 
slightly before its real `deliverAt`. In strict mode the dispatcher re-adds the 
not-yet-due message, which calls `updateTimer()`:
   
   1. The existing timer (armed for the next message) is cancelled.
   2. `delayMillis` for the re-added message is negative, so the method takes 
its early return — but it leaves `currentTimeoutTarget` pointing at the 
previous target and `timeout` non-null (now cancelled).
   3. When the early message is finally delivered, the next `updateTimer()` 
sees `timestamp == currentTimeoutTarget`, concludes the timer is already 
correctly armed, and returns. No live timer exists, so the remaining delayed 
messages are never delivered until an external dispatch round happens to find 
them.
   
   `strict=false` is immune because its cutoff (`now + tickTimeMillis`) covers 
the trim window, so early-popped messages are delivered instead of being 
re-added.
   
   Thanks to @glumia for the detailed report and root-cause analysis.
   
   ### Modifications
   
   - In the `delayMillis < 0` early return of 
`AbstractDelayedDeliveryTracker.updateTimer()`, reset `currentTimeoutTarget = 
-1` and `timeout = null` so a later call cannot short-circuit on stale state 
and will correctly re-arm the timer.
   - Add a deterministic unit test 
`testStrictModeTimerStallsAfterEarlyPopAndReAdd` in 
`InMemoryDeliveryTrackerTest` that reproduces the early-pop / re-add sequence 
and asserts a delivery timer remains armed for the still-pending message. It 
fails without the fix and passes with it.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is already covered by the added unit test 
`testStrictModeTimerStallsAfterEarlyPopAndReAdd`. Existing 
`InMemoryDeliveryTrackerTest` and `BucketDelayedDeliveryTrackerTest` continue 
to pass.
   
   ### Documentation
   
   - [ ] `doc-required`
   - [x] `doc-not-needed`
   
   ### Matching PR in forked repository
   
   PR in forked repository: coderzc/pulsar (this change is a bug fix; the fix 
branch is `fix/delayed-delivery-strict-timer-stall-25996`).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to