shibd commented on code in PR #23945:
URL: https://github.com/apache/pulsar/pull/23945#discussion_r1986242227


##########
pip/pip-406.md:
##########
@@ -0,0 +1,152 @@
+# PIP-406: Introduce metrics related to dispatch_throttled_count
+
+# Background knowledge
+
+## Motivation
+
+Currently, users can monitor subscription backlogs using the 
`pulsar_subscription_back_log_no_delayed` metric. 
+However, if [dispatch 
throttling](https://pulsar.apache.org/docs/next/concepts-throttling/) is 
configured at the broker/topic/subscription level,
+this metric may not accurately reflect whether the backlog is due to 
insufficient consumer capacity, as it could be caused by dispatch throttling.
+
+## Goals
+
+Introduce metrics to indicate the count of `messages/bytes throttled` for 
**broker/topic/subscription** level rate limit. 
+This allows users to write PromQL queries to identify subscriptions with high 
backlogs but low or no throttling, pinpointing backlogs caused by insufficient 
consumer capacity.
+
+## In Scope
+
+Broker Level:
+- Introduce the metric `pulsar_broker_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a broker.
+- Introduce the metric `pulsar_broker_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a broker.
+
+Topic Level:
+- Introduce the metric `pulsar_dispatch_throttled_msg_count` to represent the 
total count of messages throttled for a topic.
+- Introduce the metric `pulsar_dispatch_throttled_bytes_count` to represent 
the total count of bytes throttled for a topic.
+ 
+Subscription Level:
+- Introduce the metric `pulsar_subscription_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a subscription.
+- Introduce the metric `pulsar_subscription_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a subscription.
+ 
+
+## Out of Scope
+- These states are not persistent and will reset upon broker restart/ topic 
re-load / subscription reconnected.
+
+# High Level Design
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`. Increase these values in the `consumeDispatchQuota` 
method when the TokenBucket for messages or bytes is insufficient.
+2. Output these fields when retrieving metrics.
+
+
+# Detailed Design
+
+## Design & Implementation Details
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`:
+```java
+    private final LongAdder dispatchThrottleMsgCount = new LongAdder();
+    private final LongAdder dispatchThrottleBytesCount = new LongAdder();
+```
+
+2. During each 
[consumeDispatchQuota](https://github.com/apache/pulsar/blob/c4cff0ab3dac169c0a1418ef2f63f61604f6278e/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/DispatchRateLimiter.java#L97-L104),
+if token bucket is insufficient, increase these fields accordingly.
+```diff
+     public void consumeDispatchQuota(long numberOfMessages, long byteSize) {
+         AsyncTokenBucket localDispatchRateLimiterOnMessage = 
dispatchRateLimiterOnMessage;
+         if (numberOfMessages > 0 && localDispatchRateLimiterOnMessage != 
null) {
+-            localDispatchRateLimiterOnMessage.consumeTokens(numberOfMessages);
++            if 
(!localDispatchRateLimiterOnMessage.consumeTokensAndCheckIfContainsTokens(numberOfMessages))
 {
++                dispatchThrottleMsgCount.increment();
++            }
+         }
+         AsyncTokenBucket localDispatchRateLimiterOnByte = 
dispatchRateLimiterOnByte;
+         if (byteSize > 0 && localDispatchRateLimiterOnByte != null) {
+-            localDispatchRateLimiterOnByte.consumeTokens(byteSize);
++            if 
(!localDispatchRateLimiterOnByte.consumeTokensAndCheckIfContainsTokens(byteSize))
 {
++                dispatchThrottleBytesCount.increment();
++            }

Review Comment:
   
   
   > This is a valid concern and there are ways to address your concern.
   
   hi, @lhotar I agree with you point, but, How can we determine this? 
Initially, the design was as you mentioned (the subscription metrics 
represented by all three(broker/topic/subscription rate limit) dimensions). 
After discussing with @poorbarcode , we realized that this design makes it 
difficult for users to identify which dimension's configuration is affecting 
them.
   
   
   
   > I don't think that this holds since "when users see that the broker or 
topic level has already been throttled" doesn't make sense in reality. How 
would a user "see" that? Metrics should be capturing the information so that 
there isn't a need to guess.
   
   In the current design, users can monitor metrics for the three mevel. Once 
it hits the expected threshold, they can modify the corresponding 
configuration. If the broker's configuration is low, adjust the broker; if the 
topic's configuration is low, adjust the topic.
   
   
   
   > The metrics should be present to be able to notice whether a subscription 
gets throttled or not. The reason (subscription level, topic level, broker 
level) could be tracked separately. One simple way is to track all counters at 
the subscription level and aggregate them upwards to topic and broker level. 
This way there's no need to duplicate counters at runtime for stats. Many stats 
already have been implemented this way.
   
   > In dashboards, it's possible to calculate sums effectively directly from 
the subscription level metrics. There's no need to have Prometheus or OTel 
metrics for dispatch throttling at topic or broker level when the subscription 
level metric has labels which allow associating the values at topic or broker 
level.
   
   If we obtain the rate trottled metrics for the broker and topic through 
aggregation, it can't accurately represent whether the impact is due to broker 
or topic level configurations.
   
   
   @poorbarcode I’d like to hear your opinion; it seems we’re back to square 
one.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to