poorbarcode commented on code in PR #21488: URL: https://github.com/apache/pulsar/pull/21488#discussion_r1382734671
########## pip/pip-314.md: ########## @@ -0,0 +1,57 @@ +# PIP-314: Add metrics pulsar_subscription_redelivery_messages + +# Background knowledge + +## Delivery of messages in normal + +To simplify the description of the mechanism, let's take the policy [Auto Split Hash Range](https://pulsar.apache.org/docs/3.0.x/concepts-messaging/#auto-split-hash-range) as an example: + +| `0 ~ 16,384` | `16,385 ~ 32,768` | `32,769 ~ 65,536` | +|-------------------|-------------------|--------------------------------| +| ------- C1 ------ | ------- C2 ------ | ------------- C3 ------------- | + +- If the entry key is between `-1(non-include) ~ 16,384(include)`, it is delivered to C1 +- If the entry key is between `16,384(non-include) ~ 32,768(include)`, it is delivered to C2 +- If the entry key is between `32,768(non-include) ~ 65,536(include)`, it is delivered to C3 + +# Motivation + +For the example above, if `C1` is stuck or consumed slowly, the Broker will push the entries that should be delivered to `C1` into a memory collection `redelivery_messages` and read next entries continue, then the collection `redelivery_messages` becomes larger and larger and take up a lot of memory. When sending messages, it will also determine the key of the entries in the collection `redelivery_messages`, affecting performance. + +# Goals +- Add metrics + - Broker level: + - Add a metric `pulsar_broker_max_subscription_redelivery_messages_total` to indicate the max one `{redelivery_messages}` of subscriptions in the broker, used by pushing an alert if it is too large. Nit: The Broker will print a log, which contains the name of the subscription(which has the maximum count of redelivery messages). This will help find the issue subscription. + - Add a metric `pulsar_broker_memory_usage_of_redelivery_messages_bytes` to indicate the memory usage of all `redelivery_messages` in the broker. This is helpful for memory health checks. +- Improve `Topic stats`. + - Add an attribute `redeliveryMessageCount` under `SubscriptionStats` + +Differ between `redelivery_messages` and `pulsar_subscription_unacked_messages & pulsar_subscription_back_log` + +- `pulsar_subscription_unacked_messages`: the messages have been delivered to the client but have not been acknowledged yet. +- `pulsar_subscription_back_log`: how many messages should be acknowledged, contains delivered messages, and the messages which should be delivered. + +### Public API + +<strong>SubscriptionStats.java</strong> +```java +long getRedeliveryMessageCount(); +``` + +### Metrics + +**pulsar_broker_max_subscription_redelivery_messages_total** +- Description: the max one of `{pulsar_broker_memory_usage_of_redelivery_messages_bytes}` maintains by subscriptions in the broker. +- Attributes: `[cluster]` +- Unit: `Counter` + +**pulsar_broker_memory_usage_of_redelivery_messages_bytes** +- Description: the memory usage of all `redelivery_messages` in the broker. +- Attributes: `[cluster]` +- Unit: `Counter` Review Comment: Sorry, I misunderstood, it has been changed to Guage ########## pip/pip-314.md: ########## @@ -0,0 +1,57 @@ +# PIP-314: Add metrics pulsar_subscription_redelivery_messages + +# Background knowledge + +## Delivery of messages in normal + +To simplify the description of the mechanism, let's take the policy [Auto Split Hash Range](https://pulsar.apache.org/docs/3.0.x/concepts-messaging/#auto-split-hash-range) as an example: + +| `0 ~ 16,384` | `16,385 ~ 32,768` | `32,769 ~ 65,536` | +|-------------------|-------------------|--------------------------------| +| ------- C1 ------ | ------- C2 ------ | ------------- C3 ------------- | + +- If the entry key is between `-1(non-include) ~ 16,384(include)`, it is delivered to C1 +- If the entry key is between `16,384(non-include) ~ 32,768(include)`, it is delivered to C2 +- If the entry key is between `32,768(non-include) ~ 65,536(include)`, it is delivered to C3 + +# Motivation + +For the example above, if `C1` is stuck or consumed slowly, the Broker will push the entries that should be delivered to `C1` into a memory collection `redelivery_messages` and read next entries continue, then the collection `redelivery_messages` becomes larger and larger and take up a lot of memory. When sending messages, it will also determine the key of the entries in the collection `redelivery_messages`, affecting performance. + +# Goals +- Add metrics + - Broker level: + - Add a metric `pulsar_broker_max_subscription_redelivery_messages_total` to indicate the max one `{redelivery_messages}` of subscriptions in the broker, used by pushing an alert if it is too large. Nit: The Broker will print a log, which contains the name of the subscription(which has the maximum count of redelivery messages). This will help find the issue subscription. + - Add a metric `pulsar_broker_memory_usage_of_redelivery_messages_bytes` to indicate the memory usage of all `redelivery_messages` in the broker. This is helpful for memory health checks. +- Improve `Topic stats`. + - Add an attribute `redeliveryMessageCount` under `SubscriptionStats` + +Differ between `redelivery_messages` and `pulsar_subscription_unacked_messages & pulsar_subscription_back_log` + +- `pulsar_subscription_unacked_messages`: the messages have been delivered to the client but have not been acknowledged yet. +- `pulsar_subscription_back_log`: how many messages should be acknowledged, contains delivered messages, and the messages which should be delivered. + +### Public API + +<strong>SubscriptionStats.java</strong> +```java +long getRedeliveryMessageCount(); +``` + +### Metrics + +**pulsar_broker_max_subscription_redelivery_messages_total** +- Description: the max one of `{pulsar_broker_memory_usage_of_redelivery_messages_bytes}` maintains by subscriptions in the broker. +- Attributes: `[cluster]` +- Unit: `Counter` Review Comment: Sorry, I misunderstood, it has been changed to Guage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
