Denovo1998 commented on issue #25028: URL: https://github.com/apache/pulsar/issues/25028#issuecomment-3601641600
@YanshuoH In PR #24739: - replaces StampedLock/synchronized with ReentrantReadWriteLock and narrows the write‑lock sections in addMessage / getScheduledMessages, - moves snapshot persistence/loading to asynchronous code paths so that tracker operations do not block on I/O, and - decouples the tracker from the dispatcher to avoid lock‑ordering issues and make thread‑safety better testable. These changes are intended to reduce lock contention and avoid the tracker itself becoming a bottleneck when there are lots of delayed messages. --- However, your issue also clearly shows very large numberOfEntriesSinceFirstNotAckedMessage and totalNonContiguousDeletedMessagesRange. That points to a big part of the cost coming from the ManagedLedger acknowledgment state (many “ack holes”) and the configured limit managedLedgerMaxUnackedRangesToPersist. PR #24739 does not change how that ack state is represented or persisted, so it is not expected to fully solve the behavior you’re seeing by itself. In addition to the tuning that has already been discussed (compression, receiverQueueSize, etc.), the most promising mitigation on the usage side seems to be what @lhotari already suggested: - **grouping messages with similar delays into separate topics, so that each topic maintains a smaller ack state; the consumer can subscribe to multiple topics to preserve the overall behavior.** **For the current implementation mechanism of delayed messages in pulsar, this may be the only solution.** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
