There's a follow-up discussion on the PIP PR: https://github.com/apache/pulsar/pull/25706. I've shared a detailed write-up in https://github.com/apache/pulsar/pull/25706#issuecomment-4396605560.
It would be valuable to gather more thought and additional perspectives on the various options for solving the problem stated in PIP-474 before we decide on the final solution. -Lari On 2026/05/07 07:42:29 Lari Hotari wrote: > Thanks for bringing up a real problem and driving the work to solve this > issue. > > I'd suggest analyzing 3 alternative designs before deciding on the solution. > > Alternative 1: > I'd suggest looking into an alternative design that achieves the same outcome > of allowing the subscription cursor to advance. Instead of making copies of > the messages, an alternative design would be to create another subscription > to track the slow or hot keys. Essentially, the design could be very similar > to diverting to the overflow managed ledger, but there wouldn't be a need to > duplicate the data and get into a situation where different failure modes > cause unnecessary complications. > > Alternative 2: > Simply optimize the replay queue solution together with improving the > scalability of individualDeletedMessages so that it scales to 1,000,000 ack > holes and beyond. This would result in the simplest solution, which would > cover most use cases. There are multiple benefits to keeping the solution > simple. For example, backlog management doesn't change. > > Together with the PIP-430 broker cache (since 4.1.0), the replay queue > solution already avoids most unnecessary BK reads when the broker cache is > sufficiently tuned for high-scale use cases. The PIP-430 broker cache could > be improved further to achieve high cache hit rates if it turns out to be a > problem. > > Alternative 3: > The client-side code could simply route to a separate topic on its own when > it detects a hot key and acknowledge the original message. > > Regarding Alternative 2, I believe that individualDeletedMessages can already > scale to 1,000,000 ack holes and beyond when the broker is properly > configured. It could be tested with this type of configuration: > > managedLedgerMaxUnackedRangesToPersist=1000000 > managedLedgerMaxBatchDeletedIndexToPersist=1000000 > managedLedgerPersistIndividualAckAsLongArray=true > managedCursorInfoCompressionType=LZ4 > managedLedgerInfoCompressionType=LZ4 > > (The last config is unrelated, but it makes sense to also switch to using > compression.) > > I hope you could also analyze these alternatives before we proceed with > making the decision on solving the hot (or slow) key problem. Thank you for > focusing on solving this problem! > > -Lari > > On 2026/05/07 05:18:35 xiangying meng wrote: > > Hi all, > > > > I'd like to propose PIP-474: Key_Shared Hot Key Overflow Mechanism. > > > > Key_Shared is Pulsar's only built-in solution for parallel consumption > > with per-key ordering. But it has a critical production issue: a > > single stuck consumer can starve ALL other keys across ALL partitions > > within minutes, due to the containsStickyKeyHash ordering check > > flooding the Replay queue. > > > > This becomes especially urgent as AI inference workloads adopt MQ as > > their transport layer — slow consumption (seconds per request) plus > > strict per-key ordering is exactly what Key_Shared is designed for, > > yet the hot-key starvation bug makes it unusable in production. > > > > PIP-474 proposes diverting hot-key messages to an independent Overflow > > ManagedLedger, unblocking Normal Read and mark-delete advancement > > while preserving at-least-once delivery and per-key ordering. Zero > > overhead when no hot keys are present. > > > > PIP: https://github.com/apache/pulsar/pull/25706 > > > > Feedback welcome. > > > > Thanks, Xiangying Meng > > >
