lhotari opened a new pull request, #25992:
URL: https://github.com/apache/pulsar/pull/25992

   ### Motivation
   
   The default value of `managedLedgerMaxUnackedRangesToPersist` (`10000`) is 
very low and commonly causes
   issues in production. This setting bounds how many individually-acknowledged 
ranges (acknowledgment
   "holes") a subscription can persist. When the number of holes exceeds the 
limit, the broker stops
   persisting further ranges, so after a broker restart or a topic 
unload/reload some already-acknowledged
   messages are redelivered. The limit is commonly reached by:
   
   - Subscriptions using the **delayed delivery** feature, where many messages 
are acknowledged out of order
     while delayed messages stay unacknowledged.
   - **Shared / Key_Shared subscriptions where per-message processing time 
varies a lot**, which produces
     many simultaneous acknowledgment holes.
   
   `managedLedgerMaxUnackedRangesToPersist` also bounds the Key_Shared 
look-ahead limit
   `keySharedLookAheadMsgInReplayThresholdPerSubscription` (which must stay 
below
   `2 * managedLedgerMaxUnackedRangesToPersist`). With the old low default, 
workloads with **few keys** could
   stall: once the replay queue reaches 
`keySharedLookAheadMsgInReplayThresholdPerSubscription`, the
   dispatcher pauses and stops pulling new messages from the backlog, so other 
consumers sit idle even
   though there is work they could be processing. Raising the defaults gives 
these workloads more headroom.
   Users whose workloads have **low key cardinality** can raise
   `keySharedLookAheadMsgInReplayThresholdPerSubscription` further when needed.
   
   `keySharedLookAheadMsgInReplayThresholdPerSubscription` should be kept 
relative to the broker cache size
   (`managedLedgerCacheSizeMB`) and the broker's total workload, to keep the 
cache hit ratio high and avoid
   unnecessary reads to BookKeeper when messages in the replay queue are 
evicted from the cache due to cache
   limits. PIP-430 (delivered since Pulsar 4.1.x) optimizes Key_Shared caching 
so that messages in the
   replay queue are kept in the cache longer, but they are still eventually 
evicted if the cache fills up.
   
   These changes provide better defaults for all Pulsar users.
   
   **Backward compatibility:** there are no backward-compatibility concerns. 
All settings modified here are
   also available on Pulsar 4.0.x (since 4.0.3) and are fully compatible. The 
cursor/ledger-info metadata
   compression read path is self-describing, so a broker transparently reads 
both compressed and
   uncompressed metadata regardless of the configured compression type.
   
   **Upgrade note:** users upgrading to 5.0.x should first upgrade to the 
latest 4.0.x or 4.2.x before
   upgrading to 5.0.x, so that a downgrade remains possible if any issue is 
experienced.
   
   ### Modifications
   
   Changed the following defaults consistently in `conf/broker.conf`, 
`conf/standalone.conf`, and
   `ServiceConfiguration.java`:
   
   | Setting | Old | New |
   |---|---|---|
   | `managedLedgerMaxUnackedRangesToPersist` | 10000 | 200000 |
   | `managedLedgerMaxBatchDeletedIndexToPersist` | 10000 | 200000 |
   | `managedLedgerMaxUnackedRangesToPersistInMetadataStore` | 1000 | 200000 |
   | `managedCursorInfoCompressionType` | NONE | LZ4 |
   | `managedLedgerInfoCompressionType` | NONE | LZ4 |
   | `keySharedLookAheadMsgInReplayThresholdPerConsumer` | 2000 | 4000 |
   | `keySharedLookAheadMsgInReplayThresholdPerSubscription` | 20000 | 40000 |
   
   Enabling LZ4 compression for `managedCursorInfo` and `managedLedgerInfo` 
keeps the larger persisted state
   well within the metadata-store znode size limits.
   
   The documentation of `managedLedgerMaxUnackedRangesToPersist` now also notes 
that when
   `managedLedgerPersistIndividualAckAsLongArray` is enabled (the default), the 
persisted size is bounded by
   the **backlog size** (the range of entries the cursor spans), not by this 
max number of unacked ranges.
   For BookKeeper ledger storage, with the default broker `maxMessageSize` and 
BookKeeper
   `nettyMaxFrameSizeBytes`, the state fits a backlog of about 30M entries 
(excluding
   `managedLedgerMaxBatchDeletedIndexToPersist`, whose size is instead relative 
to the number of
   acknowledgment holes). See #25985 for background on this storage-format 
limitation.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is a configuration default-value change. It is covered by the 
existing tests that load and
   parse the broker configuration, and the affected behavior is exercised by 
the existing managed-cursor and
   Key_Shared dispatcher test suites.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If the box was checked, please highlight the changes*
   
   - [x] The default values of configurations
   
   The default values listed in the table above change. The 
persisted/serialized formats and the wire
   protocol are unchanged, so brokers remain compatible with older and newer 
brokers and clients.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to