Re: [I] [Bug] big individuallyDeletedMessages causes message dispatching hangs [pulsar]

via GitHub Tue, 02 Dec 2025 04:05:38 -0800


Denovo1998 commented on issue #25028:
URL: https://github.com/apache/pulsar/issues/25028#issuecomment-3601699636


   By the way. I have a speech this Saturday in China about Pulsar delayed 
messages, so I can briefly mention it here.
   
   **There are mainly two implementation methods for delayed messages:**
   
   ---
   
   ### 1. Delayed Production
   
   This means that after the Broker receives a message, it first stores the 
message in some storage (such as a built-in Topic), and there is an independent 
thread to determine whether the data has expired. If the data expires, the 
message is then written to the actual Topic, allowing the consumer to consume 
the data normally.
   
   **Advantages: Minimal changes to the main production and consumption flow. 
Only need to differentiate delayed messages when writing, write them to 
temporary storage, and independently implement timed detection to write expired 
data to the target Topic.**
   
   **Disadvantages: When the message volume is large, the message expiration 
time may not be accurate.**
   
   #### From industry implementations, most choose this method.
   
   ### 2. Delayed Visibility
   
   Each time consumption occurs, check if there are expired delayed messages. 
If yes, pull the delayed messages from temporary storage and return them to the 
consumer. Delayed messages that have not expired are filtered out and kept 
invisible.
   
   In this sub-scheme, when the producer writes data, it generally also writes 
the data to temporary storage. But unlike the previous method, each time 
consumption actively checks whether there are messages expired in the 
third-party storage. If so, the expired data is returned to the client.
   
   **Advantages: Eliminates the need for timed thread detection and writing 
logic, simplifying the flow significantly. Moreover, this method makes delayed 
messages highly accurate.**
   
   **Disadvantages: The QPS of consumption operations is usually very high, so 
this temporary storage must have very high performance for retrieval 
operations, and memory usage cannot be too high. Additionally, checking for 
delayed messages every time may cause performance issues.**
   
   ---
   
   In Pulsar, it is implemented through method two, delayed visibility (with 
some differences), and as with Pulsar's ack design, it affects normal 
consumption and production.
   
   So I propose a hybrid architecture.
   
   There is such an external service (although it is an open-source project 
under Apache-2.0 license, I won't mention its name here), which is implemented 
using the delayed production method. The client sends an "add/cancel" delayed 
message command (including the message payload). After receiving this command 
from an internal topic, it persists the message to be triggered in the future 
to RocksDB, scans for expiration by the second, and then delivers it to the 
target Topic.
   
   We can hand over delayed messages that require "complex delay capabilities" 
(can be canceled, can have recurring delays) and have long delay times 
(day-level or even month-level) to this external service for processing. If 
high-precision second-level or minute-level delayed messages are needed, 
continue using Pulsar's internal implementation (recommended to use InMemory 
implementation). I think such a hybrid architecture can well solve many 
performance issues with delayed messages in Pulsar currently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] big individuallyDeletedMessages causes message dispatching hangs [pulsar]

Reply via email to