lhotari commented on code in PR #23945:
URL: https://github.com/apache/pulsar/pull/23945#discussion_r1988922335


##########
pip/pip-406.md:
##########
@@ -0,0 +1,152 @@
+# PIP-406: Introduce metrics related to dispatch_throttled_count
+
+# Background knowledge
+
+## Motivation
+
+Currently, users can monitor subscription backlogs using the 
`pulsar_subscription_back_log_no_delayed` metric. 
+However, if [dispatch 
throttling](https://pulsar.apache.org/docs/next/concepts-throttling/) is 
configured at the broker/topic/subscription level,
+this metric may not accurately reflect whether the backlog is due to 
insufficient consumer capacity, as it could be caused by dispatch throttling.
+
+## Goals
+
+Introduce metrics to indicate the count of `messages/bytes throttled` for 
**broker/topic/subscription** level rate limit. 
+This allows users to write PromQL queries to identify subscriptions with high 
backlogs but low or no throttling, pinpointing backlogs caused by insufficient 
consumer capacity.
+
+## In Scope
+
+Broker Level:
+- Introduce the metric `pulsar_broker_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a broker.
+- Introduce the metric `pulsar_broker_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a broker.
+
+Topic Level:
+- Introduce the metric `pulsar_dispatch_throttled_msg_count` to represent the 
total count of messages throttled for a topic.
+- Introduce the metric `pulsar_dispatch_throttled_bytes_count` to represent 
the total count of bytes throttled for a topic.
+ 
+Subscription Level:
+- Introduce the metric `pulsar_subscription_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a subscription.
+- Introduce the metric `pulsar_subscription_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a subscription.
+ 
+
+## Out of Scope
+- These states are not persistent and will reset upon broker restart/ topic 
re-load / subscription reconnected.
+
+# High Level Design
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`. Increase these values in the `consumeDispatchQuota` 
method when the TokenBucket for messages or bytes is insufficient.
+2. Output these fields when retrieving metrics.
+
+
+# Detailed Design
+
+## Design & Implementation Details
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`:
+```java
+    private final LongAdder dispatchThrottleMsgCount = new LongAdder();
+    private final LongAdder dispatchThrottleBytesCount = new LongAdder();
+```
+
+2. During each 
[consumeDispatchQuota](https://github.com/apache/pulsar/blob/c4cff0ab3dac169c0a1418ef2f63f61604f6278e/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/DispatchRateLimiter.java#L97-L104),
+if token bucket is insufficient, increase these fields accordingly.
+```diff
+     public void consumeDispatchQuota(long numberOfMessages, long byteSize) {
+         AsyncTokenBucket localDispatchRateLimiterOnMessage = 
dispatchRateLimiterOnMessage;
+         if (numberOfMessages > 0 && localDispatchRateLimiterOnMessage != 
null) {
+-            localDispatchRateLimiterOnMessage.consumeTokens(numberOfMessages);
++            if 
(!localDispatchRateLimiterOnMessage.consumeTokensAndCheckIfContainsTokens(numberOfMessages))
 {
++                dispatchThrottleMsgCount.increment();
++            }
+         }
+         AsyncTokenBucket localDispatchRateLimiterOnByte = 
dispatchRateLimiterOnByte;
+         if (byteSize > 0 && localDispatchRateLimiterOnByte != null) {
+-            localDispatchRateLimiterOnByte.consumeTokens(byteSize);
++            if 
(!localDispatchRateLimiterOnByte.consumeTokensAndCheckIfContainsTokens(byteSize))
 {
++                dispatchThrottleBytesCount.increment();
++            }

Review Comment:
   > hi, @lhotar I agree with you point, but, How can we determine this? 
Initially, the design was as you mentioned (the subscription metrics 
represented by all three(broker/topic/subscription rate limit) dimensions). 
After discussing with @poorbarcode , we realized that this design makes it 
difficult for users to identify which dimension's configuration is affecting 
them.
   
   There seems to be some misunderstanding. That's exactly my point too that it 
would be possible to determine what is causing the issue and when an individual 
subscription has been throttled. You seem to be suggesting that users don't 
need that information. Assuming that all subscriptions get equally throttled is 
something that might not hold. Keeping the metrics also on subscription level 
is a way to address that so that we know exactly and don't have to guess why 
and when a subscription was throttled.
   
   What I'm suggesting is that we would simply keep the throttling even 
information. It's possible to slice-and-dice it for dashboards and show it it 
understandable format. That's the metrics side. Similarly, the topic stats 
information at subscription would contain multiple counters instead of just one 
so that it's possible to know exact the subscription was throttled.
   
   In metrics, it's possible to have a single metric and use labels to capture 
the topic and subscription and reason.
   These labels are there for any subscription metrics in any case. To get 
topic level metrics, you'd just filter the ones where the throttling reason is 
"topic". I'd assume that this is the recommended approach for exposing metrics 
in this case.
   
   > If we obtain the rate trottled metrics for the broker and topic through 
aggregation, it can't accurately represent whether the impact is due to broker 
or topic level configurations.
   
   There's a misunderstanding. All throttling events happen at the subscription 
level. The reason is either broker, topic or subscription, but they all happen 
at the subscription level. That's why it's sufficient to keep 3 counters at 
subscription level and aggregate the topic and broker level counters up to the 
topic and further to the broker level when generating topic / broker stats.
   
   Makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to