void-ptr974 opened a new issue, #26027: URL: https://github.com/apache/pulsar/issues/26027
### Search before reporting - [x] I searched in the issues and found nothing similar. ### Motivation `PendingAcksMap` is maintained per consumer and is on the broker-side consumer ack path. It tracks pending acknowledgements by `(ledgerId, entryId)` and is touched by dispatch, individual ack, partial ack, redelivery, and mark-delete cleanup. The current implementation uses nested ordered maps with boxed keys and object values. This keeps the code simple, but it also adds per-entry object overhead, allocation rate, and GC pressure when a consumer has a large pending-ack window. This issue tracks a series of focused improvements to reduce that overhead without changing consumer ack semantics. A full draft implementation is available for review and validation: https://github.com/apache/pulsar/pull/26024 ### Solution The main insight is that `PendingAcksMap` still needs ordering at the ledger level, but it does not need ordered storage inside each ledger. The outer ledger map must stay ordered because `removeAllUpTo(markDeletePosition)` removes whole ledger buckets by ledger range. Inside one ledger, the required operations are exact lookup, exact remove, remaining-unacked update, and boundary cleanup for `entryId <= markDeleteEntryId`. These operations do not require the inner map to iterate entries in sorted order. Redelivery ordering is not inherited from the per-ledger pending-ack map. The redelivery path rebuilds ordered positions separately, so replacing the inner ordered map does not change the redelivery ordering contract. The proposed direction is: ```text TreeMap<ledgerId, LedgerPendingAcks> LedgerPendingAcks: entryId -> packed pending ack value ``` The packed value keeps the existing pending-ack fields in a `long`, avoiding a per-entry metadata object. A local primitive `Long2LongOpenHashMap` avoids boxed key/value map overhead on the hot path. During benchmarking, the primitive inner map showed the expected benefit for exact-key operations, but exposed one trade-off: same-ledger prefix cleanup no longer has the cheap `TreeMap.headMap(...).clear()` path. To address that, the draft adds a lightweight per-ledger `BitSet` index for prefix cleanup. The BitSet is only a performance index, not a source of pending-ack metadata. If an entry id cannot use the index, cleanup falls back to scanning the primitive map, preserving correctness. The full change can be split into focused PRs: 1. Make `PendingAcksMap#size()` O(1). 2. Pack pending ack metadata into a `long`. 3. Add local primitive `Long2LongOpenHashMap`. 4. Use primitive storage inside `PendingAcksMap`. 5. Add the BitSet prefix index. Each step has a separate purpose: remove traversal cost, reduce per-entry objects, remove boxing overhead, and recover practical prefix-cleanup performance. ### Alternatives A few alternatives were considered: - Keep the current nested `TreeMap` layout. This preserves simple prefix cleanup, but keeps the per-entry object and boxing overhead. - Copy or reimplement a primitive ordered tree map. This would preserve ordered-prefix operations, but the implementation cost and review risk are much higher. - Add an external primitive collection dependency. This avoids local collection code, but introduces dependency and licensing/review overhead for a narrow broker hot-path use case. The proposed approach keeps the ordered structure only where the semantics require it, and uses a smaller local primitive structure for the inner hot path. ### Anything else? The draft PR includes tests for the changed assumptions: - unordered inner storage still preserves `removeAllUpTo` behavior; - whole-ledger and boundary-ledger cleanup remain correct; - entries after the mark-delete position are retained; - packed values preserve both `remainingUnacked` and `stickyKeyHash`; - negative sticky-key hashes round-trip correctly; - BitSet fallback remains correct when the index cannot be used; - the primitive map is checked with oracle, random, collision, and edge-case tests. Benchmark and JOL data will be added in comments. The benchmark uses datasets tied to real defaults such as `maxUnackedMessagesPerConsumer = 50000`, `maxUnackedMessagesPerSubscription = 200000`, and `receiverQueueSize = 1000`. This does not change the ack protocol, subscription semantics, cursor mark-delete semantics, or dispatch/redelivery ordering. The goal is to reduce local data-structure overhead in the broker consumer pending-ack path. ### Are you willing to submit a PR? - [x] I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
