GitHub user justin-lathrop edited a discussion: Pulsar upgrade to 3.0.5 causes 
prometheus metrics timeouts on brokers

After performing an upgrade from pulsar 2.9.4 to pulsar 3.0.5 within a 
Kubernetes cluster using pulsar helm 3.0 chart the Prometheus Metrics stopped 
working via the pulsar-brokers.

The pulsar-broker logs show a constant stream of 500 responses with timeouts, 
and 302 redirects.

```
... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics HTTP/1.1" 302 
0 "-" "Prometheus/2.42.0" 0
... WARN 
org.apache.pulsar.broker.stats.prometheus.PulsarPrometheusMetricsServlet - 
Prometheus metrics request timed out
... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics/ HTTP/1.1" 500 
- "http:/1.2.3.4:8080/metrics" "Prometheus/2.42.0" 60001
```

Running `pulsar-admin broker-stats monitoring-metrics` does return metrics with 
values of data moving through.  But exec'd into the pulsar-broker-0 pod and 
`curl http://localhost:8080/metrics/"; only times out.

The pulsar-proxy instance was also upgraded  as part of this and it is 
reporting metrics as expected with no timeouts by curling the pulsar-proxy-0 
metrics endpoint like so.  `curl http://localhost:8080/metrics/` from within 
the pod.

The error reported seems to be at this line in the code, but at present I do 
not see any of the other possible logs, it just seems to timeout every time in 
the async process.  
https://github.com/apache/pulsar/blob/branch-3.0/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PulsarPrometheusMetricsServlet.java#L83

Any help would be greatly appreciated!

GitHub link: https://github.com/apache/pulsar/discussions/22897

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to