[I] [improve][broker] Speed up PendingAcksMap ack tracking with primitive storage and 50%+ lower memory [pulsar]

via GitHub Sun, 14 Jun 2026 01:48:05 -0700


void-ptr974 opened a new issue, #26027:
URL: https://github.com/apache/pulsar/issues/26027


   ### Search before reporting
   
   - [x] I searched in the issues and found nothing similar.
   
   ### Motivation
   
   `PendingAcksMap` is maintained per consumer and is on the broker-side 
consumer ack path. It tracks pending acknowledgements by `(ledgerId, entryId)` 
and is touched by dispatch, individual ack, partial ack, redelivery, and 
mark-delete cleanup.
   
   The current implementation uses nested ordered maps with boxed keys and 
object values. This keeps the code simple, but it also adds per-entry object 
overhead, allocation rate, and GC pressure when a consumer has a large 
pending-ack window.
   
   This issue tracks a series of focused improvements to reduce that overhead 
without changing consumer ack semantics.
   
   A full draft implementation is available for review and validation:
   
   https://github.com/apache/pulsar/pull/26024
   
   ### Solution
   
   The main insight is that `PendingAcksMap` still needs ordering at the ledger 
level, but it does not need ordered storage inside each ledger.
   
   The outer ledger map must stay ordered because 
`removeAllUpTo(markDeletePosition)` removes whole ledger buckets by ledger 
range.
   
   Inside one ledger, the required operations are exact lookup, exact remove, 
remaining-unacked update, and boundary cleanup for `entryId <= 
markDeleteEntryId`. These operations do not require the inner map to iterate 
entries in sorted order.
   
   Redelivery ordering is not inherited from the per-ledger pending-ack map. 
The redelivery path rebuilds ordered positions separately, so replacing the 
inner ordered map does not change the redelivery ordering contract.
   
   The proposed direction is:
   
   ```text
   TreeMap<ledgerId, LedgerPendingAcks>
   
   LedgerPendingAcks:
     entryId -> packed pending ack value
   ```
   
   The packed value keeps the existing pending-ack fields in a `long`, avoiding 
a per-entry metadata object. A local primitive `Long2LongOpenHashMap` avoids 
boxed key/value map overhead on the hot path.
   
   During benchmarking, the primitive inner map showed the expected benefit for 
exact-key operations, but exposed one trade-off: same-ledger prefix cleanup no 
longer has the cheap `TreeMap.headMap(...).clear()` path.
   
   To address that, the draft adds a lightweight per-ledger `BitSet` index for 
prefix cleanup. The BitSet is only a performance index, not a source of 
pending-ack metadata. If an entry id cannot use the index, cleanup falls back 
to scanning the primitive map, preserving correctness.
   
   The full change can be split into focused PRs:
   
   1. Make `PendingAcksMap#size()` O(1).
   2. Pack pending ack metadata into a `long`.
   3. Add local primitive `Long2LongOpenHashMap`.
   4. Use primitive storage inside `PendingAcksMap`.
   5. Add the BitSet prefix index.
   
   Each step has a separate purpose: remove traversal cost, reduce per-entry 
objects, remove boxing overhead, and recover practical prefix-cleanup 
performance.
   
   ### Alternatives
   
   A few alternatives were considered:
   
   - Keep the current nested `TreeMap` layout. This preserves simple prefix 
cleanup, but keeps the per-entry object and boxing overhead.
   - Copy or reimplement a primitive ordered tree map. This would preserve 
ordered-prefix operations, but the implementation cost and review risk are much 
higher.
   - Add an external primitive collection dependency. This avoids local 
collection code, but introduces dependency and licensing/review overhead for a 
narrow broker hot-path use case.
   
   The proposed approach keeps the ordered structure only where the semantics 
require it, and uses a smaller local primitive structure for the inner hot path.
   
   ### Anything else?
   
   The draft PR includes tests for the changed assumptions:
   
   - unordered inner storage still preserves `removeAllUpTo` behavior;
   - whole-ledger and boundary-ledger cleanup remain correct;
   - entries after the mark-delete position are retained;
   - packed values preserve both `remainingUnacked` and `stickyKeyHash`;
   - negative sticky-key hashes round-trip correctly;
   - BitSet fallback remains correct when the index cannot be used;
   - the primitive map is checked with oracle, random, collision, and edge-case 
tests.
   
   Benchmark and JOL data will be added in comments. The benchmark uses 
datasets tied to real defaults such as `maxUnackedMessagesPerConsumer = 50000`, 
`maxUnackedMessagesPerSubscription = 200000`, and `receiverQueueSize = 1000`.
   
   This does not change the ack protocol, subscription semantics, cursor 
mark-delete semantics, or dispatch/redelivery ordering. The goal is to reduce 
local data-structure overhead in the broker consumer pending-ack path.
   
   ### Are you willing to submit a PR?
   
   - [x] I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [improve][broker] Speed up PendingAcksMap ack tracking with primitive storage and 50%+ lower memory [pulsar]

Reply via email to