[ 
https://issues.apache.org/jira/browse/SOLR-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944393#comment-15944393
 ] 

Hoss Man commented on SOLR-9959:
--------------------------------



I don't understand this comment...

bq. ... This makes it impossible to ask the API for all metrics for a 
component, which is what the /admin/mbeans used to provide. On the other hand, 
the /admin/metrics API provides a much more structured view of related metrics 
even if they come from different components.

Why is this impossible?

I see that on the jira/solr-9959 branch, using {{/admin/mbeans?stats=true}} 
(and the Plugin/Stats screen in the UI) is useless (the stats param is now 
completely ignored) but meanwhile if I go to {{/admin/metrics}} I can "ask" for 
all of the stats for all components (or filter the list a variety of ways) ... 
so why can't {{/admin/mbeans}} use the same code {{/admin/metrics}} to 
loop/filter over the reported metrics for each "mbean" ?

Specifically...

* Since I (as a user) can do these:
** 
http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE
** 
http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE.searcher.queryResultCache
* Why can't I get the same info from these:
** http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE
** 
http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE&key=queryResultCache

?

I don't mind breaking back compat (particularly for internal java APIs) when 
needed to make overall improvements but in this case it seems like we're 
breaking HTTP request/param level APIs (and the Admin UI) unneccessarily if the 
underlying info is still accessible from {{/admin/metrics}}.

At worst it seems like maybe we could add a new 
{{SolrMetricManager.SuffixFilter}} to let {{/admin/mbeans}} "search" for 
metrics assocaited with the {{cat}} and {{key}} combo it's currently dealing 
with as it loops over items?

But AFAICT an even cleaner/simpler solution would be:
* add a {{default MetricsMap getMetricsMap() {return null;} }} to 
{{SolrInfoBean}} (to replace {{getStatistics()}})
* any class implementing both {{SolrInfoBean}} and {{SolrMetricProducer}} 
_could_ implement {{initializeMetrics(...)}} such that it keeps a private 
reference to a {{MetricsMap}} it registers & return that from 
{{getMetricsMap()}}
** many of the {{SolrInfoBean}} classes already seem to be maintaining a 
{{private MetricsMap metricsMap;}} that is assigned in 
{{initializeMetrics(...)}} but never used (in the class) after that
* {{/admin/mbeans}} can call {{getMetricsMap()}} on each {{SolrInfoBean}} it 
loops over if {{stats=true}}

(I realize even if we do this a lot of things available from {{/admin/metrics}} 
still wouldn't be available from {{/admin/mbeans?stats=true}} -- but that's 
totally reasonable.  More significantly, this would -- IIUC -- let us ensure 
everything *currently* available from {{/admin/mbeans?stats=true}} is *still* 
available moving forward, minimizing breakage for existing consumers of 
{{/admin/mbeans}}.  If/when they want more advanced stuff, they can switch to 
{{/admin/metrics}})

----

Other misc quesions/comments about the branch in no particular order...

* in general it would be helpful if this branch/jira included a text file 
listing all the major release-note/ref-guide updates needed once this lands so 
people can fully grasp whats changing from a users perspective: 
{{/admin/mbeans}} vs {{/admin/metrics}}, enabling JMX, JMX config options no 
longer supported (see below), etc...

* {{/admin/metrics}} 
** Why doesn't {{/admin/metrics}} expose an param for the 
{{SolrMetricManager.RegexFilter}} ?
*** Maybe it does and it's just not obvious?
** (When) Are we going to expose {{/admin/metrics}} via the Admin UI?
*** Even if we "fix" the problems w/ {{/admin/mbeans?stats=true}} mentioned 
above (so that the Plugins/Stats screen starts working) having some sort of UI 
screen exposting *all* the metrics now supported seems like a good idea.
*** If we can't fix the {{/admin/mbeans?stats=true}} the way i suggested above, 
then adding a roughly equivilent UI screens using {{/admin/metrics}} seems like 
it should be a blocker for landing this branch.

* This branch modifies the {{ivy.xml}} and IDE config files for 
dataimporthandler to have a compile dependency on 
{{io.dropwizard.metrics/metrics-core}} -- but that doesn't actaully seem to be 
neccessary to compile/test DIH
** temp/abandondoned API refactorying that needs reverted?

* {{LFUCache}} and {{FastLRUCache}} still have comments that refer to the 
(removed) {{getStatistics()}} method.

* I see this comment about {{fieldCache}} registration in SolrCore...{code}
SolrFieldCacheBean solrFieldCacheBean = new SolrFieldCacheBean();
// XXX this should be registered at the CONTAINER level because it's not 
core-specific!
solrFieldCacheBean.initializeMetrics(metricManager, 
coreMetricManager.getRegistryName(), null);
infoRegistry.put("fieldCache", solrFieldCacheBean);
{code}
** Since {{UninvertingReader}} is now a solr level class, and we're making 
non-compat changes for 7.0 anyway, why don't we:
*** Register a "global" {{new SolrFieldCacheBean()}} at the container level as 
the comment suggests we should.
*** Add a new verion of {{UninvertinterReader.getUninvertedStats(IndexReader)}} 
that filters the {{CacheEntry[]}} based on the {{IndexReader.CacheKey}} of the 
reader passed in ... _and any of it's leaf readers!_
**** Or ... I suppose technically we should recursively use 
{{reader.getContext().getChildren()}} because there might be intermediate 
wrappers? maybe?
*** use the new {{UninvertinterReader.getUninvertedStats(IndexReader)}} in a 
{{new SolrFieldCacheBean(searcher.getIndexReader())}} registered at the 
core/searcher level via {{CACHE.searcher.fieldCache}} (like all the other 
searcher/reader related caches)
** Even if we want to punt the idea for a new 
{{UninvertinterReader.getUninvertedStats(IndexReader)}} method to a new jira 
and wory about it later, I don't see any reason why _this_ jira/branch 
shouldn't go ahead and move the {{fieldCache}} metrics to be a the container 
level (like the comment suggests)
*** for backcompat, we can still (for now) put a {{new SolrFieldCacheBean()}} 
in the {{infoRegistry}} of every core ... assuming we _also_ fix 
{{/admin/mbeans?stats=true}} as i suggested above, then those {{fieldCache}} 
"stats" should still be available as they always have been {{/admin/mbean}} 
users.

* MetricType
** this error (String) shouldn't be hardcoded, we should build it up from 
{{EnumSet.allOf(MetricType.class)}}...{code}
    } catch (IllegalArgumentException e) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Invalid 
metric type in: " + types + " specified. Must be one of (all, meter, timer, 
histogram, counter, gauge)", e);
    }
{code}
** We could have a {{public static final String SUPPORTED_TYPES_MSG = ...}} in 
{{MetricType}} to create this String-ified list only once on class load.

* JMX, in general, doesn't seem to be working as documented (in the ref guide) 
for the examples on this branch
** ie: {{bin/solr -e techproducts -Dcom.sun.management.jmxremote}} + jconsole 
shows no (solr) level metrics/stats.
** It appears this is because (on this branch) solr metrics are only exposed 
via JMX if there is a {{SolrJmxReporter}} configured in {{solr.xml}} ?
*** if so this is a pretty big change, and definitely needs some docs/release 
noting it.
*** It also raises some big questions:
**** Should {{solr/server/solr/solr.xml}} be updated to include 
{{SolrJmxReporter}} so it works with the examples ?
**** At a minimum should it be updated to conditionally set a reporter based on 
sysprops by {{bin/solr}} based on {{ENABLE_REMOTE_JMX_OPTS}} ?
** Personally, I think we should consider implicitly registering a 
{{SolrJmxReporter}} based on the same criteria that are currently checked on 
master when an empty {{<jmx/>}} is found: 
{{null!=JmxUtil.findFirstMBeanServer()}}
*** that way if users configure a JMX MBeanServer via {{bin/solr}} or 
{{solr.in.sh}} options, they'll automatically get Solr Metrics exposed via JMX
*** the only reason the _*very*_ legacy {{JmxMonitoredMap}} based code was 
written to require {{<jmx/>}} before doing any of this was in case a user was 
deplying Solr to an appserver along with many other apps, and you *only* wanted 
JMX stats from your appserver or other apps -- but not Solr.
**** That ship has sailed and I don't think we need to worry about it.
**** if people start {{bin/solr}} with options to enable JMX, let's go ahead 
and expose Solr Metrics via JMX
**** if, for some reason, some users really want jetty level JMX stats, but not 
solr level JMX, then let's offer some type of option to _disable_ Solr level 
JMX metric reporting ... perhaps a {{NoopSolrJmxReporter}} they can be 
configure in {{solr.xml}} ?


* SolrConfig & JmxMonitoredMap
** with these changes, it's very clear that {{<jmx .../>}} configuration in 
{{solrconfig.xml}} is now deliberately ignored, but...
*** nearly all of the sample (and many of the test) {{solrconfig.xml}} files on 
this branch still include/document {{<jmx .../>}} and it's varous options.  All 
of that still needs cleaned up on this branch before merging to master.
*** at a minimum: if a {{<jmx/>}} XML node is found in a {{solrconfig.xml}} 
file, Solr should log a WARN/ERROR w/message indicating that syntax is no 
longer supported & being ignored (and give people a pointer to the new way to 
configure JMX)
** how much of the existing config options that {{<jmx .../>}} supports 
are/should be available with {{SolrJmxReporter}} in {{solr.xml}} ?
** the old style config supported *per-core* {{agentId}}, {{serviceUrl}}, or 
{{rootName}} attributes...
*** {{rootName}} doesn't seem to be supported in {{SolrJmxReporter}}
**** this is probably fine.
**** that option dates back to deploying multiple solr.war files to a single 
appserver, something we're not supporting anymore
*** the {{agentId}} option was created to address the possibility of multiple 
JMX MBeanServers existing in the JVM on startup (possible because multiple apps 
each launched their own) and this way you could deterministicly indicate which 
one Solr should service it's stats from.
**** it looks like this is currently supported by {{SolrJmxReporter}}, but i'm 
not sure if it needs to be? i guess it doesn't hurt?
*** how {{serviceUrl}} is handled is something I'm more on the concerned 
about...
**** the use case motivation of supporting this attribute in the {{<jmx ... 
/>}} config syntax was that the JVM might/might-not be configured to service a 
platform level MBeanServer and serve jVM metrics, but each individual core 
could individually run it's own MBeanServer (w/it's own port & security 
options) -- to better isolate what stats diff JMX clients could see.
**** This {{serviceUrl}} feature never really evolved with SolrCloud, but 
perhaps now is the time when it should?
**** As implemented in {{SolrJmxReporter}} this appears (AFAICT) to now only be 
configurable at a "container wide" level (in {{solr.xml}})
**** It would be nice if there was a way to configure multiple JMX MBeanServer 
instances (on each solr node) that only expose metrics from specific 
collections (such that metrics from those collections would not be available 
via the platform MBeanServer even if it was enabled via started by the command 
line options).
***** Not sure if/where/how it would make sense to expose some configuration 
like this, or even if it's worth pursuing (let alone pursuing right now) but I 
wanted to point out the original point of {{serviceUrl}} in case anyone has any 
ideas.
***** Could we perhaps enhance the {{SolrMetricReporter}} base class so that 
all types of reporters could be configured with inclusion/exclusion rules based 
on collection name?

* Testing JMX w/solr.xml changes...
** I manually modified {{solr/server/solr/solr.xml}} to include a 
{{SolrJmxReporter}} to continue to test JMX support
** With this change, using {{bin/solr -e techproducts 
-Dcom.sun.management.jmxremote}} + jconsole did start to expose a lot of 
metrics via JMX -- but many of the solr realated MBeans exposed don't seem very 
useful...
*** Notably: when I went to look at the {{CACHE}} related MBeans (ex: 
filterCache), each one was exposing only a single "Attribute" named "Value" 
(with MBeanAttributeValueInfo.Type==Object) which appears to contain the 
toString (or perhaps an array of name=value strings?) of the various cache 
stats (ie: lookups, cumulative_lookups, hitratio, etc...)
*** This is vastly different from master where this sort of toString info is 
used as the MBean's _Description_ but each one of those individual "metrics" 
are exposed as individual MBean _Attributes_ with the expected 
MBeanAttributeValueInfo.Type (Long, Long, Float, etc...)
*** Even in the single "Value" Attribute that is exposed, some really useful 
info currently available as MBean Attributes seems to be missing, notably: 
"description" (from the toString()), and "name" (the class implementing the 
cache)
*** Not being able to access the individual (strongly typed!) cache "stats" as 
MBean Attributes seems like a major step backwards?
** perplexingly, w/ the modified {{solr.xml}} file, running {{bin/solr -e 
techproducts}} (NOTE: no command line args to use JMX) still caused Solr to run 
an MBeanServer that I could connect to with jconsole and use to view the same 
solr metrics.
*** presumably this is because of the this code in {{SolrJmxReporter}} which 
doesn't really make sense to me....{code}
      ManagementFactory.getPlatformMBeanServer(); // Ensure at least one 
MBeanServer is available.
{code}
*** Setting asside my earlier suggestions that we should consider changing our 
"default" behavior when {{bin/solr}} / {{solr.in.sh}} is configured to run the 
JVM with JMX enabled, this code as is doesn't follow the existing precendent of 
{{JmxMonitoredMap}} to be a no-op unless:
**** an MBeanServer is started at the process/appserver level (ie: command line 
args)
**** an agentId or serviceUrl is explicitly configured
*** In general it seems like the the _intended_ where/how/why to configure Solr 
to publish metrics via JMX is now fairly confusing...
**** if the intent here is that moving forward users _must_ configure a 
{{SolrJmxReporter}} in {{solr.xml}}, and they _must_ configure a {{serviceUrl}} 
at that point to control access, then we definitely need to think through how 
we're deprecating/replacing the various things {{bin/solr}} currently does with 
{{ENABLE_REMOTE_JMX_OPTS=true}} (and if/how serviceUrl's can support 
configuration of things like {{com.sun.management.jmxremote.local.only}}, 
etc...  -- i have no idea if they can)
*** My 2cents: expanding on my suggestion above regarding solr's default JMX 
behavior...
**** I think all calls to {{ManagementFactory.getPlatformMBeanServer()}} should 
be removed from {{SolrJmxReporter}}, and that class should be a No-Op unless: 
either agentId/serviceUrl is explicitly configured *OR* an already existing 
MBeanServer is found (via {{JmxUtil.findFirstMBeanServer()}})
**** I think that if no metrics reporters (or perhaps specifically no 
{{SolrJmxReporter}} instances?) are configured in {{solr.xml}}, CoreContainer 
(or some equivilent) should check {{JmxUtil.findFirstMBeanServer()}} and 
implicily register an {{SolrJmxReporter}} instance if the MBeanServer is 
non-null.
**** that way command line JVM options (should be) the end all be decider as to 
whether a Platform MBeanServer is launched, and if so: then Solr metrics be 
exposed via JMX by default.  If users want something more fancy they can 
override with explicit {{SolrJmxReporter}} / {{SolrJmxNoopReporter}} 
configurations

* other uses of {{ManagementFactory.getPlatformMBeanServer()}} ...
** I see now that as part of the earlier metrics work, {{SolrDispatchFilter}} 
was also modified to call {{ManagementFactory.getPlatformMBeanServer()}} -- 
evidently so that the Metrics API can access JVM level MBeans?
*** This explains why (on both this branch and on master) running {{bin/solr}} 
is all it takes to be able to connect to the process via jconsole -- even w/o 
{{solr.xml}} changes or command line args requesting that the JVM enable JMX 
services.
*** I don't think it's a good idea for Solr to be forcing the JVM to spin up 
MBeanServer instances & accepting JMX connections, w/o the user 
running/configuring solr explicitly requesting that -- particularly in the case 
of this {{SolrDispatchFilter}} code which seems to run even if the user doesn't 
care about metrics/JMX at all!
*** It seems like the {{BufferPoolMetricSet}} and {{OperatingSystemMetricSet}} 
classes used in {{SolrDispatchFilter}} should probably be using the various 
{{*MXBean}} impls available directly from {{ManagementFactory}} w/o needing an 
MBeanServer to be running
**** see examples of doing this in {{SystemInfoHandler}}
*** but if an {{MBeanServer}} is really the only way to bridge these type of 
OS/JVM level info into the metrics code then at a minimum we should change 
{{SolrDispatchFilter}} to also use {{JmxUtil.findFirstMBeanServer()}} and if 
the {{MBeanServer}} returned is null skip registering these metrics (in their 
place: log an info message, and/or register a simple String constnat metric, 
that certain metrics are only available if JMX is enabled)
** In general, I think it's a really bad idea for any (non-test) Solr level 
code to be calling {{ManagementFactory.getPlatformMBeanServer()}} ... we should 
probably consider marking it as a forbidden-api!

* I'm -1 on removing {{TestJmxIntegration}}
** if we need to change the JMX ObjectNames we look for to match the new 
metrics based code, and/or change the test init to ensure {{SolrJmxReporter}} 
is used by the CoreContainer (depending on what decisions are made about 
default behavior) -- then so be it.
** but it seems really important to have a simple test like this sanity 
checking some simple Solr metrics via a JMX {{MBeanServer}}
*** Yes i see {{CollectionsAPIDistributedZkTest}} and {{SolrJmxReporterTest}}...
**** {{CollectionsAPIDistributedZkTest}} is trusting the JMX stats to check 
that instanceDirs don't collide, not doing anything to ensure that JMX is 
reutrning expected values.   (Independent of the current Jira, this test should 
almost certaily be re-written to check this some other way)
**** {{SolrJmxReporterTest}} creates some random metrics -- that's not really a 
good _integration_ test of the metrics/JMX code to check that external clients 
can find specific Solr metrics like Searcher's "numDocs" (or filterCache's 
"lookups"), or that the values of those metrics are updated when expected (and 
to the expected value) based on actions that happen in solr (ie: docs added to 
the index, queries executed with an fq, etc...)

* RequestHandlersTest
** testStatistics has been removed, but why isn't there a new equivilent 
replacement test/asserts using some internal metrics API?

* CollectionsAPIDistributedZkTest
** Note this change...{noformat}
         } catch (Exception e) {
+          log.info(e.toString());
           // ignore, just continue - probably a "category" or "source" 
attribute
           // not found
         }
{noformat}
** if this comment is still valid: then logging the exception here seems like 
shot term debuging code that should not be committed
** if this comment is no longer valid based on the other changes in the loop 
above this, then the comment should be deleted
** if the log message is still useful, then it should log the {{Exception}} 
(w/stack trace), not the {{toString()}}

* MetricUtils
** all of these (newly) public methods should have javadocs explaining their 
function / intended purpose

* SolrCore
** using {{\_notset\_}} and {{\_auto\_}} as special values here for the 
collection & shardId metrics seems like a really bad idea.  In both cases...
*** if we're not in cloud mode, then it seems like we shouldn't report a 
collection name or a shardId as part of the metrics at all
*** if we're in cloud mode, then isn't it an error if either collection name or 
shardId are null?
**** even if there is some situation i'm not thinking of where these might be 
legitimately null (in cloud mode), can't/shouldn't we just return {{NULL}} for 
these metric instead of trying to use magic string values that client code 
might missinterpret?
**** if {{NULL}} isn't an option, then throwing an exception seems better then 
using magic strings that might be missinterpreted/missused by client code
**** IE: I'd rather some automated client code get a failure trying to access 
the metric named "collection" then get a value that the client code might try 
to use down the line to query for a collection and gets a weird 404 error later.




> SolrInfoMBean-s category and hierarchy cleanup
> ----------------------------------------------
>
>                 Key: SOLR-9959
>                 URL: https://issues.apache.org/jira/browse/SOLR-9959
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: master (7.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-9959.patch
>
>
> SOLR-9947 changed categories of some of {{SolrInfoMBean-s}}, and it also 
> added an alternative view in JMX, similar to the one produced by 
> {{SolrJmxReporter}}.
> Some changes were left out from that issue because they would break the 
> back-compatibility in 6.x, but they should be done before 7.0:
> * remove the old JMX view of {{SolrInfoMBean}}-s and improve the new one so 
> that it's more readable and useful.
> * in many cases {{SolrInfoMBean.getName()}} just returns a FQCN, but it could 
> be more informative - eg. for highlighter or query plugins this could be the 
> symbolic name of a plugin that users know and use in configuration.
> * top-level categories need more thought. On one hand it's best to minimize 
> their number, on the other hand they need to meaningfully represent the 
> functionality of components that use them. SOLR-9947 made some cosmetic 
> changes, but more discussion is necessary (eg. QUERY vs. SEARCHHANDLER)
> * we should consider removing some of the methods in {{SolrInfoMBean}} that 
> usually don't return any useful information, eg. {{getDocs}}, {{getSource()}} 
> and {{getVersion()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to