GitHub user justin-lathrop edited a discussion: Pulsar upgrade to 3.0.5 causes prometheus metrics timeouts on brokers
After performing an upgrade from pulsar 2.9.4 to pulsar 3.0.5 within a Kubernetes cluster using pulsar helm 3.0 chart the Prometheus Metrics stopped working via the pulsar-brokers. The pulsar-broker logs show a constant stream of 500 responses with timeouts, and 302 redirects. ``` ... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.42.0" 0 ... WARN org.apache.pulsar.broker.stats.prometheus.PulsarPrometheusMetricsServlet - Prometheus metrics request timed out ... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics/ HTTP/1.1" 500 - "http:/1.2.3.4:8080/metrics" "Prometheus/2.42.0" 60001 ``` Running `pulsar-admin broker-stats monitoring-metrics` does return metrics with values of data moving through. But exec'd into the pulsar-broker-0 pod and `curl http://localhost:8080/metrics/" only times out. The pulsar-proxy instance was also upgraded as part of this and it is reporting metrics as expected with no timeouts by curling the pulsar-proxy-0 metrics endpoint like so. `curl http://localhost:8080/metrics/` from within the pod. The error reported seems to be at this line in the code, but at present I do not see any of the other possible logs, it just seems to timeout every time in the async process. https://github.com/apache/pulsar/blob/branch-3.0/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PulsarPrometheusMetricsServlet.java#L83 Any help would be greatly appreciated! GitHub link: https://github.com/apache/pulsar/discussions/22897 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
