frankjkelly opened a new issue #13971: URL: https://github.com/apache/pulsar/issues/13971
**Is your feature request related to a problem? Please describe.** Recently two of our 9 bookies were quarantined and we did not know until we saw logs in the Broker. ``` platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - Bookie platform-pulsar-bookkeeper-1.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors. platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - Bookie platform-pulsar-bookkeeper-8.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors. ``` When these bookies are quarantined the overall throughput scalability of our system is reduced. We would like some scheme to monitor and alert if a bookie is quarantined. **Describe the solution you'd like** We'd like to see some kind of Prometheus metrics added that indicated the number of Bookies that are currently quarantined. Once that metrics is added we can use it in Prometheus alerting to notify us of the problem. **Describe alternatives you've considered** Can't think of any -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
