lhotari opened a new pull request, #25992:
URL: https://github.com/apache/pulsar/pull/25992
### Motivation
The default value of `managedLedgerMaxUnackedRangesToPersist` (`10000`) is
very low and commonly causes
issues in production. This setting bounds how many individually-acknowledged
ranges (acknowledgment
"holes") a subscription can persist. When the number of holes exceeds the
limit, the broker stops
persisting further ranges, so after a broker restart or a topic
unload/reload some already-acknowledged
messages are redelivered. The limit is commonly reached by:
- Subscriptions using the **delayed delivery** feature, where many messages
are acknowledged out of order
while delayed messages stay unacknowledged.
- **Shared / Key_Shared subscriptions where per-message processing time
varies a lot**, which produces
many simultaneous acknowledgment holes.
`managedLedgerMaxUnackedRangesToPersist` also bounds the Key_Shared
look-ahead limit
`keySharedLookAheadMsgInReplayThresholdPerSubscription` (which must stay
below
`2 * managedLedgerMaxUnackedRangesToPersist`). With the old low default,
workloads with **few keys** could
stall: once the replay queue reaches
`keySharedLookAheadMsgInReplayThresholdPerSubscription`, the
dispatcher pauses and stops pulling new messages from the backlog, so other
consumers sit idle even
though there is work they could be processing. Raising the defaults gives
these workloads more headroom.
Users whose workloads have **low key cardinality** can raise
`keySharedLookAheadMsgInReplayThresholdPerSubscription` further when needed.
`keySharedLookAheadMsgInReplayThresholdPerSubscription` should be kept
relative to the broker cache size
(`managedLedgerCacheSizeMB`) and the broker's total workload, to keep the
cache hit ratio high and avoid
unnecessary reads to BookKeeper when messages in the replay queue are
evicted from the cache due to cache
limits. PIP-430 (delivered since Pulsar 4.1.x) optimizes Key_Shared caching
so that messages in the
replay queue are kept in the cache longer, but they are still eventually
evicted if the cache fills up.
These changes provide better defaults for all Pulsar users.
**Backward compatibility:** there are no backward-compatibility concerns.
All settings modified here are
also available on Pulsar 4.0.x (since 4.0.3) and are fully compatible. The
cursor/ledger-info metadata
compression read path is self-describing, so a broker transparently reads
both compressed and
uncompressed metadata regardless of the configured compression type.
**Upgrade note:** users upgrading to 5.0.x should first upgrade to the
latest 4.0.x or 4.2.x before
upgrading to 5.0.x, so that a downgrade remains possible if any issue is
experienced.
### Modifications
Changed the following defaults consistently in `conf/broker.conf`,
`conf/standalone.conf`, and
`ServiceConfiguration.java`:
| Setting | Old | New |
|---|---|---|
| `managedLedgerMaxUnackedRangesToPersist` | 10000 | 200000 |
| `managedLedgerMaxBatchDeletedIndexToPersist` | 10000 | 200000 |
| `managedLedgerMaxUnackedRangesToPersistInMetadataStore` | 1000 | 200000 |
| `managedCursorInfoCompressionType` | NONE | LZ4 |
| `managedLedgerInfoCompressionType` | NONE | LZ4 |
| `keySharedLookAheadMsgInReplayThresholdPerConsumer` | 2000 | 4000 |
| `keySharedLookAheadMsgInReplayThresholdPerSubscription` | 20000 | 40000 |
Enabling LZ4 compression for `managedCursorInfo` and `managedLedgerInfo`
keeps the larger persisted state
well within the metadata-store znode size limits.
The documentation of `managedLedgerMaxUnackedRangesToPersist` now also notes
that when
`managedLedgerPersistIndividualAckAsLongArray` is enabled (the default), the
persisted size is bounded by
the **backlog size** (the range of entries the cursor spans), not by this
max number of unacked ranges.
For BookKeeper ledger storage, with the default broker `maxMessageSize` and
BookKeeper
`nettyMaxFrameSizeBytes`, the state fits a backlog of about 30M entries
(excluding
`managedLedgerMaxBatchDeletedIndexToPersist`, whose size is instead relative
to the number of
acknowledgment holes). See #25985 for background on this storage-format
limitation.
### Verifying this change
- [x] Make sure that the change passes the CI checks.
This change is a configuration default-value change. It is covered by the
existing tests that load and
parse the broker configuration, and the affected behavior is exercised by
the existing managed-cursor and
Key_Shared dispatcher test suites.
### Does this pull request potentially affect one of the following parts:
*If the box was checked, please highlight the changes*
- [x] The default values of configurations
The default values listed in the table above change. The
persisted/serialized formats and the wire
protocol are unchanged, so brokers remain compatible with older and newer
brokers and clients.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]