lhotari commented on PR #25696:
URL: https://github.com/apache/pulsar/pull/25696#issuecomment-4390743081

   **[BUG] Empty active segments straddling `t` will fail the entire seek.**
   
   `pulsar-broker/.../ScalableTopicController.java#seekSubscriptionOnSegment` 
(~L646): for a freshly-created active segment that overlaps `timestampMs` but 
has no messages yet (just split, no producer activity), the classification 
falls through to `seekSegmentSubscriptionAsync(name, sub, t)`. On the 
segment-owner broker, `PersistentSubscription.resetCursor(t)` calls 
`PersistentMessageFinder.findMessages(t, …)` which returns `null` (no entries 
match), and then `cursor.getFirstPosition()` is also `null` for an empty 
managed ledger — so the future completes exceptionally with 
`SubscriptionInvalidCursorPosition` (`PersistentSubscription.java` ~L887).
   
   Because `seekSubscription` uses `CompletableFuture.allOf` (fail-fast) and 
there's no compensation, **one empty straddling segment fails the whole 
parent-topic seek**, with the cursor already partially repositioned on the 
segments that completed first. There's no rollback path.
   
   Realistic worst case: split-segment just ran, the new child segments are 
active and empty, operator runs `pulsar-admin scalable-topics seek` to recover 
from a bad deploy → the seek fails with no clear hint that the issue is empty 
children.
   
   Options:
   - Short-circuit in `seekSubscriptionOnSegment` when the segment is known to 
be empty (needs an entry-count signal, e.g. on the segment metadata or a cheap 
stat call).
   - Tolerate `SubscriptionInvalidCursorPosition` from segments the controller 
can verify are empty — distinguishing "empty" from "corrupted cursor" requires 
care.
   - At minimum, aggregate failures and surface which segment(s) failed instead 
of propagating the first one.
   
   This is the kind of issue an end-to-end "produce → seek → consume" 
integration test across multiple segments would have caught — the new 
controller test only exercises mocks, so the managed-ledger reset-cursor path 
isn't covered at all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to