mlbiscoc commented on code in PR #3713:
URL: https://github.com/apache/solr/pull/3713#discussion_r2403538093


##########
solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/NodeValueFetcher.java:
##########
@@ -197,10 +173,12 @@ private void getRemoteMetricsFromTags(
       SimpleSolrResponse rsp =
           ctx.cloudSolrClient.getHttpClient().requestWithBaseUrl(baseUrl, 
req::process);
 
-      for (Metrics t : requestedMetricNames) {
-        Object value = t.extractResult(rsp.getResponse());
-        if (value != null) {
-          ctx.tags.put(t.tagName, value);
+      try (InputStream prometheusStream = (InputStream) 
rsp.getResponse().get(STREAM_KEY)) {
+        for (Metrics t : requestedMetricNames) {
+          Object value = t.extractFromPrometheus(prometheusStream);

Review Comment:
   Sorry thats a bad blunder on my part. Yes there is a test to catch this, but 
I had refactored it out that only `cores` metric made it to this code path so 
it passed and the input stream wasn't passed the first iteration. 
   
   Moved things around and now it failed. Its because the way NodeValueFetcher 
was written had these hardcoded strings and metric names for "well known 
metrics" and I mistakenly removed some. Now I tried changing this to read these 
metrics from the stream but the code changes got way more confusing because of 
how much custom overriding this had to do like "add cores metrics" but others 
just "return the first value".  
   
   In the end, I found this PR and all this metric fetching code frustrating to 
work with and understand as it is already. Instead of making it way more 
confusing,  I just added a TODO and loaded it in a list. There isn't many "well 
known metrics" anyways and they are all node level (Not huge cardinality) so I 
think the response shouldn't be huge and loading it into memory shouldn't be 
bad.
   
   Honestly looking at this whole PR, I don't think this NodeValueFetcher and 
SolrClientNodeStateProvider code should have done all this "metric 
double/integer -> system properties (strings) -> converter" all in a single 
map. Its incredibly difficult to maintain especially since it is hard coded 
metrics and trying to stitch it all together is a nightmare. Now we are reading 
prometheus format and shimming it with a bunch of parsing. 
   
   That shouldn't be what the /admin/metrics endpoint is intended for. We 
should probably do something like implement our own `MetricReader` otel 
instance (SolrMetricReader) and expose those "well known metrics" in a 
structured format that is easier to read for this NodeValueFetchers or for 
internal callers. Might be too late now but just my thoughts....
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to