Denovo1998 commented on code in PR #24928:
URL: https://github.com/apache/pulsar/pull/24928#discussion_r2596938523


##########
pip/pip-448.md:
##########
@@ -0,0 +1,166 @@
+# PIP-448: Topic-level Delayed Message Tracker for Memory Optimization
+
+# Background knowledge
+
+In Apache Pulsar, **Delayed Message Delivery** allows producers to specify a 
delay for a message, ensuring it is not delivered to any consumer until the 
specified time has passed. This is a useful feature for implementing tasks like 
scheduled reminders or retry mechanisms with backoff.
+
+The legacy default mechanism for handling delayed messages is the 
`InMemoryDelayedDeliveryTracker`. This tracker is instantiated on a 
*per-subscription* basis within the broker. When a topic has multiple 
subscriptions, each subscription gets its own independent 
`InMemoryDelayedDeliveryTracker` instance.
+
+The consequence of this per-subscription design is that if a delayed message 
is published to a topic with 'N' subscriptions, that message's metadata (its 
position) is stored 'N' times in the broker's memory. This leads to significant 
memory overhead, especially for topics with a large number of subscriptions, as 
the memory usage scales linearly with the number of subscriptions.
+
+# Motivation
+
+The primary motivation for this proposal is to address the high memory 
consumption caused by the legacy per-subscription delayed message tracking 
mechanism. For topics with hundreds or thousands of subscriptions, the memory 
footprint for delayed messages can become prohibitively large. Each delayed 
message's position is duplicated across every subscription's tracker, leading 
to a memory usage pattern of `O(num_delayed_messages * num_subscriptions)`.
+
+This excessive memory usage can cause:
+*   Increased memory pressure on Pulsar brokers.
+*   More frequent and longer Garbage Collection (GC) pauses, impacting broker 
performance.
+*   Potential OutOfMemoryErrors, leading to broker instability.
+*   Limited scalability for use cases that rely on many subscriptions per 
topic, such as IoT or large-scale microservices with shared subscriptions.
+
+By introducing an alternative, topic-level tracking mechanism, we can provide 
a memory-efficient solution to enhance broker stability and scalability for 
these critical use cases.
+
+# Goals
+
+## In Scope
+*   Introduce a new, optional, topic-level delayed message tracker that is 
shared across all subscriptions of a single topic. This will store each delayed 
message's position only once.
+*   Significantly reduce the memory footprint for delayed message handling 
when this new tracker is enabled, changing the memory complexity from 
`O(num_delayed_messages * num_subscriptions)` to `O(num_delayed_messages)`.
+*   Provide new configuration options to allow operators to tune the behavior 
of the new tracker, such as pruning intervals and cleanup delays.
+*   Maintain the existing `DelayedDeliveryTracker` interface to ensure 
seamless integration with the dispatcher logic.
+*   Preserve the existing per-subscription 
`InMemoryDelayedDeliveryTrackerFactory` as the default for backward 
compatibility, requiring operators to opt-in to use the new topic-level tracker.
+
+## Out of Scope
+*   This proposal does not modify the persistent, bucket-based delayed 
delivery tracker (`BucketDelayedDeliveryTracker`).
+*   No changes will be made to the public-facing client APIs, REST APIs, or 
the wire protocol. This is a broker-internal optimization.
+*   The semantic behavior of delayed messages from a user's perspective will 
remain identical.
+
+# High Level Design
+
+The core idea is to introduce a new, opt-in `DelayedDeliveryTrackerFactory` 
that implements a shared, topic-level tracking strategy. This is achieved with 
two new components: a `TopicDelayedDeliveryTrackerManager` and a 
subscription-scoped `InMemoryTopicDelayedDeliveryTracker`.

Review Comment:
   Yes.
   
   At the code level, the new TopicDelayedDeliveryTrackerManager already 
exposes createOrGetTracker(AbstractPersistentDispatcherMultipleConsumers) 
returning the DelayedDeliveryTracker interface:
   
   ```java
   public interface TopicDelayedDeliveryTrackerManager extends AutoCloseable {
       DelayedDeliveryTracker 
createOrGetTracker(AbstractPersistentDispatcherMultipleConsumers dispatcher);
       // ...
   }
   ```
   
   so the dispatcher only depends on the DelayedDeliveryTracker interface and 
not on the concrete in-memory implementation. The 
InMemoryTopicDelayedDeliveryTracker is just one implementation that acts as a 
proxy to the shared topic-level manager in this PIP.
   
   In the PIP text I focused on the in-memory implementation (e.g. 
InMemoryTopicDelayedDeliveryTracker*) because this proposal explicitly keeps 
BucketDelayedDeliveryTracker out of scope (see the “Out of Scope” section). The 
goal of PIP-448 is to address the memory footprint of the legacy in-memory 
tracker first, without changing any semantics of the persistent, bucket-based 
tracker.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to