sijie commented on issue #6518: health monitoring, alarms URL: https://github.com/apache/pulsar/issues/6518#issuecomment-604309990 @ilyam8 it is just an example for your reference. For most of the people, they define their alerting rules based on the metrics you can find on https://pulsar.apache.org/docs/en/reference-metrics/. Some people might care about write latency and some people might care about the backlog. If you are looking more for failure-rate like metrics, currently only bookkeeper has metrics about "success" and "failures". You can use them to calculate the rate across the cluster. For brokers, currently, it doesn't have such metrics. We can look into adding these metrics.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services