Danyal Prout created SOLR-13234:
-----------------------------------
Summary: Prometheus Metric Exporter Not Threadsafe
Key: SOLR-13234
URL: https://issues.apache.org/jira/browse/SOLR-13234
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: metrics
Affects Versions: 7.6, 8.0
Reporter: Danyal Prout
Fix For: 8.x, master (9.0)
The Solr Prometheus Exporter collects metrics when it receives a HTTP request
from Prometheus. Prometheus sends this request, on its [scrape
interval|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config].
When the time taken to collect the Solr metrics is greater than the scrape
interval of the Prometheus server, this results in concurrent metric collection
occurring in this
[method|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L86].
This method doesn’t appear to be thread safe, for instance you could have
concurrent modifications of a
[map|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L119].
After a while the Solr Exporter processes becomes nondeterministic, we've
observed NPE and loss of metrics.
To address this, I'm proposing the following fixes:
1. Read/parse the configuration at startup and make it immutable.
2. Collect metrics from Solr on an interval which is controlled by the Solr
Exporter and cache the metric samples to return during Prometheus scraping.
Metric collection can be expensive, for example executing arbitrary Solr
searches, it's not ideal to allow for concurrent metric collection and on an
interval which is not defined by the Solr Exporter.
There are also a few other performance improvements that we've made while
fixing this, for example using the ClusterStateProvider instead of sending
multiple HTTP requests to each Solr node to lookup all the cores.
I'm currently finishing up these changes which I'll submit as a PR.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]