Denovo1998 commented on code in PR #24928:
URL: https://github.com/apache/pulsar/pull/24928#discussion_r2596938523
##########
pip/pip-448.md:
##########
@@ -0,0 +1,166 @@
+# PIP-448: Topic-level Delayed Message Tracker for Memory Optimization
+
+# Background knowledge
+
+In Apache Pulsar, **Delayed Message Delivery** allows producers to specify a
delay for a message, ensuring it is not delivered to any consumer until the
specified time has passed. This is a useful feature for implementing tasks like
scheduled reminders or retry mechanisms with backoff.
+
+The legacy default mechanism for handling delayed messages is the
`InMemoryDelayedDeliveryTracker`. This tracker is instantiated on a
*per-subscription* basis within the broker. When a topic has multiple
subscriptions, each subscription gets its own independent
`InMemoryDelayedDeliveryTracker` instance.
+
+The consequence of this per-subscription design is that if a delayed message
is published to a topic with 'N' subscriptions, that message's metadata (its
position) is stored 'N' times in the broker's memory. This leads to significant
memory overhead, especially for topics with a large number of subscriptions, as
the memory usage scales linearly with the number of subscriptions.
+
+# Motivation
+
+The primary motivation for this proposal is to address the high memory
consumption caused by the legacy per-subscription delayed message tracking
mechanism. For topics with hundreds or thousands of subscriptions, the memory
footprint for delayed messages can become prohibitively large. Each delayed
message's position is duplicated across every subscription's tracker, leading
to a memory usage pattern of `O(num_delayed_messages * num_subscriptions)`.
+
+This excessive memory usage can cause:
+* Increased memory pressure on Pulsar brokers.
+* More frequent and longer Garbage Collection (GC) pauses, impacting broker
performance.
+* Potential OutOfMemoryErrors, leading to broker instability.
+* Limited scalability for use cases that rely on many subscriptions per
topic, such as IoT or large-scale microservices with shared subscriptions.
+
+By introducing an alternative, topic-level tracking mechanism, we can provide
a memory-efficient solution to enhance broker stability and scalability for
these critical use cases.
+
+# Goals
+
+## In Scope
+* Introduce a new, optional, topic-level delayed message tracker that is
shared across all subscriptions of a single topic. This will store each delayed
message's position only once.
+* Significantly reduce the memory footprint for delayed message handling
when this new tracker is enabled, changing the memory complexity from
`O(num_delayed_messages * num_subscriptions)` to `O(num_delayed_messages)`.
+* Provide new configuration options to allow operators to tune the behavior
of the new tracker, such as pruning intervals and cleanup delays.
+* Maintain the existing `DelayedDeliveryTracker` interface to ensure
seamless integration with the dispatcher logic.
+* Preserve the existing per-subscription
`InMemoryDelayedDeliveryTrackerFactory` as the default for backward
compatibility, requiring operators to opt-in to use the new topic-level tracker.
+
+## Out of Scope
+* This proposal does not modify the persistent, bucket-based delayed
delivery tracker (`BucketDelayedDeliveryTracker`).
+* No changes will be made to the public-facing client APIs, REST APIs, or
the wire protocol. This is a broker-internal optimization.
+* The semantic behavior of delayed messages from a user's perspective will
remain identical.
+
+# High Level Design
+
+The core idea is to introduce a new, opt-in `DelayedDeliveryTrackerFactory`
that implements a shared, topic-level tracking strategy. This is achieved with
two new components: a `TopicDelayedDeliveryTrackerManager` and a
subscription-scoped `InMemoryTopicDelayedDeliveryTracker`.
Review Comment:
Yes.
At the code level, the new TopicDelayedDeliveryTrackerManager already
exposes createOrGetTracker(AbstractPersistentDispatcherMultipleConsumers)
returning the DelayedDeliveryTracker interface:
```java
public interface TopicDelayedDeliveryTrackerManager extends AutoCloseable {
DelayedDeliveryTracker
createOrGetTracker(AbstractPersistentDispatcherMultipleConsumers dispatcher);
// ...
}
```
so the dispatcher only depends on the DelayedDeliveryTracker interface and
not on the concrete in-memory implementation. The
InMemoryTopicDelayedDeliveryTracker is just one implementation that acts as a
proxy to the shared topic-level manager in this PIP.
In the PIP text I focused on the in-memory implementation (e.g.
InMemoryTopicDelayedDeliveryTracker*) because this proposal explicitly keeps
BucketDelayedDeliveryTracker out of scope (see the “Out of Scope” section). The
goal of PIP-448 is to address the memory footprint of the legacy in-memory
tracker first, without changing any semantics of the persistent, bucket-based
tracker.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]