Re: [D] Pulsar upgrade to 3.0.5 causes prometheus metrics timeouts on brokers [pulsar]

via GitHub Wed, 12 Jun 2024 08:01:27 -0700


GitHub user justin-lathrop edited a discussion: Pulsar upgrade to 3.0.5 causes 
prometheus metrics timeouts on brokers

After performing an upgrade from pulsar 2.9.4 to pulsar 3.0.5 within a
Kubernetes cluster using pulsar helm 3.0 chart the Prometheus Metrics stopped
working via the pulsar-brokers.

The pulsar-broker logs show a constant stream of 500 responses with timeouts,
and 302 redirects.

```
... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics HTTP/1.1" 302
0 "-" "Prometheus/2.42.0" 0
... WARN
org.apache.pulsar.broker.stats.prometheus.PulsarPrometheusMetricsServlet -
Prometheus metrics request timed out
... INFO org.eclipse.jetty.server.RequestLog - ... "GET /metrics/ HTTP/1.1" 500
- "http:/1.2.3.4:8080/metrics" "Prometheus/2.42.0" 60001
```

Running `pulsar-admin broker-stats monitoring-metrics` does return metrics with
values of data moving through. But exec'd into the pulsar-broker-0 pod and
`curl http://localhost:8080/metrics/"; only times out.

The pulsar-proxy instance was also upgraded as part of this and it is
reporting metrics as expected with no timeouts by curling the pulsar-proxy-0
metrics endpoint like so. `curl http://localhost:8080/metrics/` from within
the pod.

The error reported seems to be at this line in the code, but at present I do
not see any of the other possible logs, it just seems to timeout every time in
the async process.
https://github.com/apache/pulsar/blob/branch-3.0/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PulsarPrometheusMetricsServlet.java#L83

Any help would be greatly appreciated!

GitHub link: https://github.com/apache/pulsar/discussions/22897

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Pulsar upgrade to 3.0.5 causes prometheus metrics timeouts on brokers [pulsar]

Reply via email to