Denovo1998 commented on issue #25028: URL: https://github.com/apache/pulsar/issues/25028#issuecomment-3601699636
By the way. I have a speech this Saturday in China about Pulsar delayed messages, so I can briefly mention it here. **There are mainly two implementation methods for delayed messages:** --- ### 1. Delayed Production This means that after the Broker receives a message, it first stores the message in some storage (such as a built-in Topic), and there is an independent thread to determine whether the data has expired. If the data expires, the message is then written to the actual Topic, allowing the consumer to consume the data normally. **Advantages: Minimal changes to the main production and consumption flow. Only need to differentiate delayed messages when writing, write them to temporary storage, and independently implement timed detection to write expired data to the target Topic.** **Disadvantages: When the message volume is large, the message expiration time may not be accurate.** #### From industry implementations, most choose this method. ### 2. Delayed Visibility Each time consumption occurs, check if there are expired delayed messages. If yes, pull the delayed messages from temporary storage and return them to the consumer. Delayed messages that have not expired are filtered out and kept invisible. In this sub-scheme, when the producer writes data, it generally also writes the data to temporary storage. But unlike the previous method, each time consumption actively checks whether there are messages expired in the third-party storage. If so, the expired data is returned to the client. **Advantages: Eliminates the need for timed thread detection and writing logic, simplifying the flow significantly. Moreover, this method makes delayed messages highly accurate.** **Disadvantages: The QPS of consumption operations is usually very high, so this temporary storage must have very high performance for retrieval operations, and memory usage cannot be too high. Additionally, checking for delayed messages every time may cause performance issues.** --- In Pulsar, it is implemented through method two, delayed visibility (with some differences), and as with Pulsar's ack design, it affects normal consumption and production. So I propose a hybrid architecture. There is such an external service (although it is an open-source project under Apache-2.0 license, I won't mention its name here), which is implemented using the delayed production method. The client sends an "add/cancel" delayed message command (including the message payload). After receiving this command from an internal topic, it persists the message to be triggered in the future to RocksDB, scans for expiration by the second, and then delivers it to the target Topic. We can hand over delayed messages that require "complex delay capabilities" (can be canceled, can have recurring delays) and have long delay times (day-level or even month-level) to this external service for processing. If high-precision second-level or minute-level delayed messages are needed, continue using Pulsar's internal implementation (recommended to use InMemory implementation). I think such a hybrid architecture can well solve many performance issues with delayed messages in Pulsar currently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
