congbobo184 commented on code in PR #20859:
URL: https://github.com/apache/pulsar/pull/20859#discussion_r1274658021


##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge

Review Comment:
   sure, I will delete the Background section



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.

Review Comment:
   right



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.
+
+Maintaining the health of subscriptions is of paramount importance for the 
smooth operation of a messaging system. As message delivery involves 
interaction between producers and consumers, backlogs or unacknowledged 
messages can lead to data delays or losses. By monitoring 
"pulsar_subscription_backlog_duration," we can gain real-time insights into the 
duration of message backlogs and promptly detect any potential processing 
issues within subscriptions.
+
+The configuration and alerting settings for this monitoring item play a 
crucial role in responding swiftly to issues. When 
"pulsar_subscription_backlog_duration" indicates an abnormal increase in 
duration or unusual message backlogs, system administrators receive immediate 
alert notifications. These alerts enable administrators to quickly identify 
problems and take necessary measures to prevent message losses or further 
delays.
+
+In conclusion, the introduction of the "pulsar_subscription_backlog_duration" 
monitoring item enables effective monitoring of subscription health, real-time 
issue detection, and prevention of message delivery delays and losses. 
Additionally, timely alerting mechanisms empower proactive responses, ensuring 
the reliability and efficiency of the messaging system. This is essential for 
providing high-quality message delivery services, ensuring user experiences, 
and maintaining data integrity.
+
+# Goals
+
+## In Scope
+
+* SubscriptionStatsImpl add this stat
+* Metrics
+
+
+## Out of Scope
+
+* Implementing changes to the core functionality of the Pulsar messaging 
system itself.
+* Not include `NonPersistentTopic`.
+* Not include `DelayMessage`
+
+# High Level Design
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add this flag
+* `SubscriptionStatsImpl` add `backlogDuration` variable
+* `AggregatedSubscriptionStats` add `backlogDuration` variable
+* add metric iterm named `pulsar_subscription_back_log_duration`
+
+# Detailed Design
+
+
+## Design & Implementation Details
+* when `PersistentSubscription` invoke getStats then reade the (`markDelete` + 
1) entry convert to `MessageMetadata` `publish_time` to represent the 
`earliestUnAckMessagePublishTime`
+* use currentTime - `earliestUnAckMessagePublishTime` represent 
`backlogDuration`
+* if markDelete haven't changed, don't need to get the new 
`earliestUnAckMessagePublishTime`, use directly, to reduce the read entry op
+
+## Public-facing Changes
+
+
+### Configuration
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`

Review Comment:
   good suggestion



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.

Review Comment:
   because this metric need to read the position after the markDelete position, 
so when the position don't in the cache, it will be read from bookie and will 
consume the performance. so only obtain this metric, generate it



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.
+
+Maintaining the health of subscriptions is of paramount importance for the 
smooth operation of a messaging system. As message delivery involves 
interaction between producers and consumers, backlogs or unacknowledged 
messages can lead to data delays or losses. By monitoring 
"pulsar_subscription_backlog_duration," we can gain real-time insights into the 
duration of message backlogs and promptly detect any potential processing 
issues within subscriptions.
+
+The configuration and alerting settings for this monitoring item play a 
crucial role in responding swiftly to issues. When 
"pulsar_subscription_backlog_duration" indicates an abnormal increase in 
duration or unusual message backlogs, system administrators receive immediate 
alert notifications. These alerts enable administrators to quickly identify 
problems and take necessary measures to prevent message losses or further 
delays.
+
+In conclusion, the introduction of the "pulsar_subscription_backlog_duration" 
monitoring item enables effective monitoring of subscription health, real-time 
issue detection, and prevention of message delivery delays and losses. 
Additionally, timely alerting mechanisms empower proactive responses, ensuring 
the reliability and efficiency of the messaging system. This is essential for 
providing high-quality message delivery services, ensuring user experiences, 
and maintaining data integrity.
+
+# Goals
+
+## In Scope
+
+* SubscriptionStatsImpl add this stat
+* Metrics
+
+
+## Out of Scope
+
+* Implementing changes to the core functionality of the Pulsar messaging 
system itself.
+* Not include `NonPersistentTopic`.
+* Not include `DelayMessage`
+
+# High Level Design
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add this flag
+* `SubscriptionStatsImpl` add `backlogDuration` variable
+* `AggregatedSubscriptionStats` add `backlogDuration` variable
+* add metric iterm named `pulsar_subscription_back_log_duration`
+
+# Detailed Design
+
+
+## Design & Implementation Details
+* when `PersistentSubscription` invoke getStats then reade the (`markDelete` + 
1) entry convert to `MessageMetadata` `publish_time` to represent the 
`earliestUnAckMessagePublishTime`
+* use currentTime - `earliestUnAckMessagePublishTime` represent 
`backlogDuration`
+* if markDelete haven't changed, don't need to get the new 
`earliestUnAckMessagePublishTime`, use directly, to reduce the read entry op
+
+## Public-facing Changes

Review Comment:
   oh, sure. 



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.
+
+Maintaining the health of subscriptions is of paramount importance for the 
smooth operation of a messaging system. As message delivery involves 
interaction between producers and consumers, backlogs or unacknowledged 
messages can lead to data delays or losses. By monitoring 
"pulsar_subscription_backlog_duration," we can gain real-time insights into the 
duration of message backlogs and promptly detect any potential processing 
issues within subscriptions.
+
+The configuration and alerting settings for this monitoring item play a 
crucial role in responding swiftly to issues. When 
"pulsar_subscription_backlog_duration" indicates an abnormal increase in 
duration or unusual message backlogs, system administrators receive immediate 
alert notifications. These alerts enable administrators to quickly identify 
problems and take necessary measures to prevent message losses or further 
delays.
+
+In conclusion, the introduction of the "pulsar_subscription_backlog_duration" 
monitoring item enables effective monitoring of subscription health, real-time 
issue detection, and prevention of message delivery delays and losses. 
Additionally, timely alerting mechanisms empower proactive responses, ensuring 
the reliability and efficiency of the messaging system. This is essential for 
providing high-quality message delivery services, ensuring user experiences, 
and maintaining data integrity.
+
+# Goals
+
+## In Scope
+
+* SubscriptionStatsImpl add this stat
+* Metrics
+
+
+## Out of Scope
+
+* Implementing changes to the core functionality of the Pulsar messaging 
system itself.
+* Not include `NonPersistentTopic`.
+* Not include `DelayMessage`
+
+# High Level Design
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add this flag
+* `SubscriptionStatsImpl` add `backlogDuration` variable
+* `AggregatedSubscriptionStats` add `backlogDuration` variable
+* add metric iterm named `pulsar_subscription_back_log_duration`
+
+# Detailed Design
+
+
+## Design & Implementation Details
+* when `PersistentSubscription` invoke getStats then reade the (`markDelete` + 
1) entry convert to `MessageMetadata` `publish_time` to represent the 
`earliestUnAckMessagePublishTime`
+* use currentTime - `earliestUnAckMessagePublishTime` represent 
`backlogDuration`
+* if markDelete haven't changed, don't need to get the new 
`earliestUnAckMessagePublishTime`, use directly, to reduce the read entry op
+
+## Public-facing Changes
+
+
+### Configuration
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add the config
+
+# Backward & Forward Compatability
+
+## Revert
+
+* config `subscriptionBacklogDurationEnabled = false` in `broker.conf`
+* lower the broker version
+
+## Upgrade
+
+* config `subscriptionBacklogDurationEnabled = true` in `broker.conf`
+
+# Alternatives
+
+* marDeletePosition changed every time change the 
`earliestUnAckMessagePublishTime`, It will be very frequent and consume 
performance

Review Comment:
   yes, so it's an alternative, for now I can't think of anything better



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.
+
+Maintaining the health of subscriptions is of paramount importance for the 
smooth operation of a messaging system. As message delivery involves 
interaction between producers and consumers, backlogs or unacknowledged 
messages can lead to data delays or losses. By monitoring 
"pulsar_subscription_backlog_duration," we can gain real-time insights into the 
duration of message backlogs and promptly detect any potential processing 
issues within subscriptions.
+
+The configuration and alerting settings for this monitoring item play a 
crucial role in responding swiftly to issues. When 
"pulsar_subscription_backlog_duration" indicates an abnormal increase in 
duration or unusual message backlogs, system administrators receive immediate 
alert notifications. These alerts enable administrators to quickly identify 
problems and take necessary measures to prevent message losses or further 
delays.
+
+In conclusion, the introduction of the "pulsar_subscription_backlog_duration" 
monitoring item enables effective monitoring of subscription health, real-time 
issue detection, and prevention of message delivery delays and losses. 
Additionally, timely alerting mechanisms empower proactive responses, ensuring 
the reliability and efficiency of the messaging system. This is essential for 
providing high-quality message delivery services, ensuring user experiences, 
and maintaining data integrity.
+
+# Goals
+
+## In Scope
+
+* SubscriptionStatsImpl add this stat
+* Metrics
+
+
+## Out of Scope
+
+* Implementing changes to the core functionality of the Pulsar messaging 
system itself.
+* Not include `NonPersistentTopic`.
+* Not include `DelayMessage`
+
+# High Level Design
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add this flag
+* `SubscriptionStatsImpl` add `backlogDuration` variable
+* `AggregatedSubscriptionStats` add `backlogDuration` variable
+* add metric iterm named `pulsar_subscription_back_log_duration`
+
+# Detailed Design
+
+
+## Design & Implementation Details
+* when `PersistentSubscription` invoke getStats then reade the (`markDelete` + 
1) entry convert to `MessageMetadata` `publish_time` to represent the 
`earliestUnAckMessagePublishTime`

Review Comment:
   This will cause a large error, will bring a lot of inaccuracy to the alarm, 
and affect the user's judgment. If the user's consumption program is normal, it 
will hit the cache, and the performance overhead is controllable



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.
+
+Maintaining the health of subscriptions is of paramount importance for the 
smooth operation of a messaging system. As message delivery involves 
interaction between producers and consumers, backlogs or unacknowledged 
messages can lead to data delays or losses. By monitoring 
"pulsar_subscription_backlog_duration," we can gain real-time insights into the 
duration of message backlogs and promptly detect any potential processing 
issues within subscriptions.
+
+The configuration and alerting settings for this monitoring item play a 
crucial role in responding swiftly to issues. When 
"pulsar_subscription_backlog_duration" indicates an abnormal increase in 
duration or unusual message backlogs, system administrators receive immediate 
alert notifications. These alerts enable administrators to quickly identify 
problems and take necessary measures to prevent message losses or further 
delays.
+
+In conclusion, the introduction of the "pulsar_subscription_backlog_duration" 
monitoring item enables effective monitoring of subscription health, real-time 
issue detection, and prevention of message delivery delays and losses. 
Additionally, timely alerting mechanisms empower proactive responses, ensuring 
the reliability and efficiency of the messaging system. This is essential for 
providing high-quality message delivery services, ensuring user experiences, 
and maintaining data integrity.
+
+# Goals
+
+## In Scope
+
+* SubscriptionStatsImpl add this stat
+* Metrics
+
+
+## Out of Scope
+
+* Implementing changes to the core functionality of the Pulsar messaging 
system itself.
+* Not include `NonPersistentTopic`.
+* Not include `DelayMessage`
+
+# High Level Design
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add this flag
+* `SubscriptionStatsImpl` add `backlogDuration` variable
+* `AggregatedSubscriptionStats` add `backlogDuration` variable
+* add metric iterm named `pulsar_subscription_back_log_duration`
+
+# Detailed Design
+
+
+## Design & Implementation Details
+* when `PersistentSubscription` invoke getStats then reade the (`markDelete` + 
1) entry convert to `MessageMetadata` `publish_time` to represent the 
`earliestUnAckMessagePublishTime`
+* use currentTime - `earliestUnAckMessagePublishTime` represent 
`backlogDuration`
+* if markDelete haven't changed, don't need to get the new 
`earliestUnAckMessagePublishTime`, use directly, to reduce the read entry op
+
+## Public-facing Changes
+
+
+### Configuration
+
+* add config `subscriptionBacklogDurationEnabled` in `broker.conf`
+  * note: because we need to read the markDelete position next position 
message, it will consume performance when the message is not in the cache, so 
add the config
+
+# Backward & Forward Compatability
+
+## Revert
+
+* config `subscriptionBacklogDurationEnabled = false` in `broker.conf`
+* lower the broker version
+
+## Upgrade
+
+* config `subscriptionBacklogDurationEnabled = true` in `broker.conf`
+
+# Alternatives
+
+* marDeletePosition changed every time change the 
`earliestUnAckMessagePublishTime`, It will be very frequent and consume 
performance
+
+# General Notes
+* If there are a large number of subscriptions, and markDelete postion + 1 
does not exist in the cache, it may consume bookie performance

Review Comment:
   yes, this situation exists



##########
pip/pip-285.md:
##########
@@ -0,0 +1,73 @@
+# Background knowledge
+Existing monitoring items in Pulsar, such as "pulsar_subscription_back_log" 
and "pulsar_subscription_back_log_no_delayed," provide valuable insights into 
the quantity of backlogged messages. However, they lack a metric that directly 
measures the duration of message backlog. Monitoring the duration of backlog is 
vital as it allows us to understand the persistence of message accumulation 
within a subscription over time.
+
+# Motivation
+
+The motivation behind introducing the new monitoring item 
"pulsar_subscription_backlog_duration" is to effectively monitor the health of 
subscriptions within the Pulsar messaging system. This health metric represents 
whether there are messages that have not been successfully acknowledged (ACKed) 
and potential consumer-side issues. Additionally, this monitoring item allows 
us to configure alerting mechanisms, ensuring timely notifications to users, 
thereby facilitating proactive response to potential issues.

Review Comment:
   Might be useful, but this PIP is only for subscription



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to