mlbiscoc commented on code in PR #3713:
URL: https://github.com/apache/solr/pull/3713#discussion_r2403538093
##########
solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/NodeValueFetcher.java:
##########
@@ -197,10 +173,12 @@ private void getRemoteMetricsFromTags(
SimpleSolrResponse rsp =
ctx.cloudSolrClient.getHttpClient().requestWithBaseUrl(baseUrl,
req::process);
- for (Metrics t : requestedMetricNames) {
- Object value = t.extractResult(rsp.getResponse());
- if (value != null) {
- ctx.tags.put(t.tagName, value);
+ try (InputStream prometheusStream = (InputStream)
rsp.getResponse().get(STREAM_KEY)) {
+ for (Metrics t : requestedMetricNames) {
+ Object value = t.extractFromPrometheus(prometheusStream);
Review Comment:
Sorry thats a bad blunder on my part. Yes there is a test to catch this, but
I had refactored it out that only `cores` metric made it to this code path so
it passed and the input stream wasn't passed the first iteration.
Moved things around and now it failed. Its because the way NodeValueFetcher
was written had these hardcoded strings and metric names for "well known
metrics" and I mistakenly removed some. Now I tried changing this to read these
metrics from the stream but the code changes got way more confusing because of
how much custom overriding this had to do like "add cores metrics" but others
just "return the first value".
In the end, I found this PR and all this metric fetching code frustrating to
work with and understand as it is already. Instead of making it way more
confusing, I just added a TODO and loaded it in a list. There isn't many "well
known metrics" anyways and they are all node level (Not huge cardinality) so I
think the response shouldn't be huge and loading it into memory shouldn't be
bad.
Honestly looking at this whole PR, I don't think this NodeValueFetcher and
SolrClientNodeStateProvider code should have done all this "metric
double/integer -> system properties (strings) -> converter" all in a single
map. Its incredibly difficult to maintain especially since it is hard coded
metrics and trying to stitch it all together is a nightmare. Now we are reading
prometheus format and shimming it with a bunch of parsing.
That shouldn't be what the /admin/metrics endpoint is intended for. We
should probably do something like implement our own `MetricReader` otel
instance (SolrMetricReader) and expose those "well known metrics" in a
structured format that is easier to read for this NodeValueFetchers or for
internal callers. Might be too late now but just my thoughts....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]