This is an automated email from the ASF dual-hosted git repository.
daojun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 8548de14c8b [improve][pip] PIP-323: Complete Backlog Quota Telemetry
(#21709)
8548de14c8b is described below
commit 8548de14c8b0ca0f5265e612bf6c7c50f8df894b
Author: Asaf Mesika <[email protected]>
AuthorDate: Wed Dec 20 11:24:44 2023 +0200
[improve][pip] PIP-323: Complete Backlog Quota Telemetry (#21709)
---
pip/pip-323.md | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 171 insertions(+)
diff --git a/pip/pip-323.md b/pip/pip-323.md
new file mode 100644
index 00000000000..dc607fff3d5
--- /dev/null
+++ b/pip/pip-323.md
@@ -0,0 +1,171 @@
+# PIP-323: Complete Backlog Quota Telemetry
+
+# Background knowledge
+
+## Backlog
+
+A topic in Pulsar is the place where messages are written to. They are
consumed by subscriptions. A topic can have many
+subscriptions, and it is those that maintains the state of message
acknowledgment, per subscription - which messages
+were acknowledged and which were not.
+
+A subscription backlog is the set of unacknowledged messages in that
subscription.
+A subscription backlog size is the sum of the size of the unacknowledged
messages (in bytes)..
+
+Since a topic can have many subscriptions, and each has its own backlog, how
does one define a backlog for a topic?
+A topic backlog is defined as the backlog of the subscription which has the
**oldest** unacknowledged message.
+Since acknowledged messages can be interleaved with unacknowledged messages,
calculating the exact size of that
+subscription backlog can be expensive as it requires I/O operations to read
the messages from the ledgers.
+For that reason, the topic backlog size is actually defined to be the
*estimated* backlog size of that subscription.
+It does so by summarizing the size of all the ledgers, starting from the
current active one (the one being written to),
+up to the ledger which contains the oldest unacknowledged message for that
subscription (There is actually a faster
+way to calculate it, but this was the definition chosen for this estimation in
Pulsar).
+
+A topic backlog age is the age of the oldest unacknowledged message (same
subscription as defined for topic backlog size).
+If that message was written 30 minutes ago, its age is 30 minutes, and so is
the topic backlog age.
+
+## Backlog Quota
+
+Pulsar has a feature called [backlog
quota](https://pulsar.apache.org/docs/3.1.x/cookbooks-retention-expiry/#backlog-quotas).
+It allows a user to define a quota - in effect, a limit - which limits the
topic backlog.
+There are two types of quotas:
+
+1. Size based: The limit is for the topic backlog size (as we defined above).
+2. Time based: The limit is for the topic backlog age (as we defined above).
+
+Once a topic backlog exceeds either one of those limits, an action is taken to
hold the backlog to that limit:
+
+* The producer write is placed on hold for a certain amount of time before
failing.
+* The producer write is failed
+* The subscriptions oldest unacknowledged messages will be acknowledged
in-order until both the topic backlog size or
+ age will fall inside the limit (quota). The process is called backlog
eviction (happens every interval).
+
+The quotas can be defined as a default value for any topic, by using the
following broker configuration keys:
+`backlogQuotaDefaultLimitBytes` and `backlogQuotaDefaultLimitSecond`.
+
+The quota can also be specified directly for all topics in a given namespace
using the namespace policy,
+or a specific topic using a topic policy.
+
+## Monitoring Backlog Quota
+
+The user today can calculate quota used for size based limit, since there are
two metrics exposed today on
+a topic level: `pulsar_storage_backlog_quota_limit` and
`pulsar_storage_backlog_size`.
+You can just divide the two to get a percentage and know how close the topic
backlog to its size limit.
+
+For the time-based limit, the only metric exposed today is the quota itself -
`pulsar_storage_backlog_quota_limit_time`
+
+## Backlog Quota Eviction in the Broker
+
+The broker has a method called `BrokerService.monitorBacklogQuota()`. It is
scheduled to run every x seconds,
+as defined by the configuration `backlogQuotaCheckIntervalInSeconds`.
+This method loops over all persistent topics, and for each topic is checks
whether the topic backlog exceeded
+either one of those topics.
+
+As mentioned before, checking backlog size is a memory-only calculation, since
+each topic has the list of ledgers stored in-memory, including the size of
each ledger. Same goes for the subscriptions,
+they are all stored in memory, and the `ManagedCursor` keeps track of the
subscription with the oldest unacknowledged
+message, thus retrieveing it is O(1). Checking backlog based on time is costly
if configuration key
+`preciseTimeBasedBacklogQuotaCheck` was set to true. In that case, it needs to
read the oldest message to obtain
+its public timestamp, which is expensive in terms of I/O. If it was set to
false, it's in-memory access only, since
+it uses the age of the ledger instead of the message, and the ledgers metadata
is kept in memory.
+
+For each topic which has exceeded its quota, if the policy chosen is eviction,
then the process it performed
+synchronously. This process consumes I/O, as it needs read messages (using
skip) to know where to stop acknowledging
+messages.
+
+
+# Motivation
+
+Users which have defined backlog quota based on time, have no means today to
monitor the backlog quota usage,
+time-wise, to know whether the topic backlog is close to its time limit or
even passed it.
+
+If it has passed it, the user has no means to know if it happened, when and
how many times.
+
+
+# Goals
+
+## In Scope
+- Allow the user to know the backlog quota usage for time-based quota, per
topic
+- Allow the user to know how many times backlog eviction happened, and for
which backlog quota type
+
+## Out of Scope
+
+None
+
+
+# High Level Design
+
+We'll use the existing backlog monitoring process running in intervals. For
each topic, the subscription with
+the oldest unacknowledged message is retrieved, to calculate the topic backlog
age. At that point, we will
+cache the following for the oldest unacknowledged message:
+* Subscription name
+* Message position
+* Message publish timestamp
+
+That cache will allow us to add a metric exposing the topic backlog age -
`pulsar_storage_backlog_age_seconds`,
+which will be both consistent (same ones used for deciding on backlog
eviction) and cheap to retrieve
+(no additional I/O involved).
+Coupled with the existing `pulsar_storage_backlog_quota_limit_time` metric,
the user can use both to divide and
+get the usage of the quota (both are in seconds units).
+
+We will add the subscription name containing the oldest unacknowledged message
to the Admin API
+topic stats endpoints (`{tenant}/{namespace}/{topic}/stats` and
`{tenant}/{namespace}/{topic}/partitioned-stats`),
+allowing the user a complete workflow: alert using metrics when topic backlog
is about to be exceeded, then
+query topic stats for that topic to retrieve the subscription name which
contains the oldest message.
+For completeness, we will also add the backlog quota limits, both age and
size, and the age of oldest
+unacknowledged message.
+
+We will add a metric allowing the user to know how many times the usage
exceeded the quota, both for time or size -
+`pulsar_storage_backlog_quota_exceeded_evictions_total`, where the
`quota_type` label will be either `time` or
+`size`. Monitoring that counter over time will allow the user to know when a
topic backlog exceeded its quota,
+and if backlog eviction was chosen as action, then it happened, and how many
times.
+
+Some users may want the backlog quota check to happen more frequently, and as
a consequence, the backlog age
+metric more frequently updated. They can modify
`backlogQuotaCheckIntervalInSeconds` configuration key, but without
+knowing how long this check takes, it will be hard for them. Hence, we will
add the metric
+`pulsar_storage_backlog_quota_check_duration_seconds` which will be of
histogram type.
+
+# Detailed Design
+
+## Public-facing Changes
+
+### Public API
+Adding the following to the response of topic stats, of both
`{tenant}/{namespace}/{topic}/stats`
+and `{tenant}/{namespace}/{topic}/partitioned-stats`:
+
+* `backlogQuotaLimitSize` - the size in bytes of the topic backlog quota
+* `backlogQuotaLimitTime` - the topic backlog age quota, in seconds.
+* `oldestBacklogMessageAgeSeconds` - the age of the oldest unacknowledged
(i.e. backlog) message, measured by
+ the time elapsed from its published time, in seconds. This value is
recorded every backlog quota check
+ interval, hence it represents the value seen in the last check.
+* `oldestBacklogMessageSubscriptionName` - the name of the subscription
containing the oldest unacknowledged message.
+ This value is recorded every backlog quota check interval, hence it
represents the value seen in the last check.
+
+
+### Metrics
+
+| Name | Description
| Attributes | Units |
+|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--------------------------------------------------------|---------|
+| `pulsar_storage_backlog_age_seconds` | Gauge. The
age of the oldest unacknowledged message (backlog)
| cluster, namespace, topic | seconds |
+| `pulsar_storage_backlog_quota_exceeded_evictions_total` | Counter.
The number of times a backlog was evicted since it has exceeded its quota
| cluster, namespace, topic, quota_type = (time \| size) | |
+| `pulsar_storage_backlog_quota_check_duration_seconds` | Histogram.
The duration of the backlog quota check process.
| cluster | seconds |
+| `pulsar_broker_storage_backlog_quota_exceeded_evictions_total` | Counter.
The number of times a backlog was evicted since it has exceeded its quota, in
broker level | cluster, quota_type = (time \| size) |
|
+
+* Since `pulsar_storage_backlog_age_seconds` can not be aggregated, with
proper meaning, to a namespace-level, it will
+ not be included as a metric when configuration key
`exposeTopicLevelMetricsInPrometheus` is set to false.
+* `pulsar_storage_backlog_quota_exceeded_evictions_total` will be included as
a metric also in namespace aggregation.
+
+# Alternatives
+
+One alternative is to separate the backlog quota check into 2 separate
processes, running in their own frequency:
+1. Check backlog quota exceeded for all persistent topics. The result will be
marked in memory.
+ If precise time backlog quota was configured then this will the I/O cost as
described before.
+2. Evict messages for those topics marked.
+
+This *may* enable more frequent updates to the backlog age metric making it
more fresh, but the cost associated with it
+might be high, since it might result in more frequent I/O calls, especially
with many topics.
+Another disadvantage is that it makes the backlog check and eviction more
complex.
+
+# Links
+
+* Mailing List discussion thread:
https://lists.apache.org/thread/xv33xjjzc3t2n06ynz2gmcd4s06ckrqh
+* Mailing List voting thread:
https://lists.apache.org/thread/x2ypnft3x5jdyyxbwgvzxgcw20o44vps