[
https://issues.apache.org/jira/browse/SOLR-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848396#comment-17848396
]
Matthew Biscocho commented on SOLR-10654:
-----------------------------------------
The overhead post processing due to JQ is not the main problem but certainly is
one. I would say, running the Prometheus exporter can be costly, especially at
scale and running multiple instances. It offers the flexibility for
configurability but I don't think that solves everyones use-case as it is not
free to just run the Prometheus Exporter. I think the overhead of aggregating
metrics should happen on the Grafana or Prometheus level while the exposed
metrics themselves should just be raw values. With this PR, prometheus can just
scrape and then the aggregation can be done on Grafana directly and skips the
extra http call hops from the prometheus exporter and JQ processing.
I took a bit of time to measure some performance between my PR and the
prometheus exporter. I created a cloud with 2 nodes and 50 collections to get a
bunch of metrics. For the cloud, I curl'd each node individually and captured
the response time of each node. Not sure if prometheus scrapes sequentially or
in parallel but looks like both just take around ~0.6s locally.
I modified the Prometheus exporter config to only scrape the same metrics my PR
currently exports (Core registry) and added a few lines of code to capture the
timing it takes for scraping and JQ processing. Looking at the timing it was
taking around 4-5 seconds per collection interval which is significantly longer.
`My PR:`
`curl -o /dev/null -s -w 'Total: %\{time_total}s\n'
'localhost:8983/solr/admin/metrics?wt=prometheus'`
`Total: 0.614125s`
`curl -o /dev/null -s -w 'Total: %\{time_total}s\n'
'localhost:7574/solr/admin/metrics?wt=prometheus'`
`Total: 0.597078s`
`Prometheus Exporter:`
INFO - 2024-05-21 18:10:28.930;
org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed
metrics collection
INFO - 2024-05-21 18:11:28.923;
org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning
metrics collection
PT4.355627S
I want to say this is due to the Http calls and JQ processing the Prometheus
Exporter needs to do while my PR is doing a straight internal conversion.
Although it is doing the conversion per call, it doesn't seem to be as costly
as the prometheus exporter is.
> Expose Metrics in Prometheus format DIRECTLY from Solr
> ------------------------------------------------------
>
> Key: SOLR-10654
> URL: https://issues.apache.org/jira/browse/SOLR-10654
> Project: Solr
> Issue Type: Improvement
> Components: metrics
> Reporter: Keith Laban
> Priority: Major
> Attachments: prometheus_metrics.txt
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> Expose metrics via a `wt=prometheus` response type.
> Example scape_config in prometheus.yml:
> {code}
> scrape_configs:
> - job_name: 'solr'
> metrics_path: '/solr/admin/metrics'
> params:
> wt: ["prometheus"]
> static_configs:
> - targets: ['localhost:8983']
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]