lhotari opened a new issue, #23205: URL: https://github.com/apache/pulsar/issues/23205
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Motivation Currently, it's very challenging to investigate issues related to message replay ("message redelivery controller"). Some examples of this include: - The "repeated Read-and-discard when using Key_Shared mode" issue mitigated by: - https://github.com/apache/pulsar/pull/22245 - https://github.com/apache/pulsar/pull/21739 - An older mitigation: #7105 ### Solution Add topic stats and metrics for observing message replay and related Key_Shared filtering (hash blocking) behavior. ### Specific Metrics to Consider 1. Number of messages in redelivery (replay) 2. For Key_Shared subscriptions: Ways to observe internal state related to blocked hashes 3. Counter for delayed delivery messages being added to delivery (replay) ### Implementation Requirements - It should be possible to detect replays in topic stats (or internal stats) and also in aggregated metrics - The aggregated metrics should be usable in monitoring tools (e.g., Grafana dashboards) - The specific types of metrics (counters, gauges) to be used will be determined in the detailed design phase ## Expected Benefits - Improved observability for message replay and Key_Shared behavior - Easier troubleshooting of related issues - Enhanced monitoring capabilities for Pulsar clusters ### Alternatives _No response_ ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
