[
https://issues.apache.org/jira/browse/SOLR-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944393#comment-15944393
]
Hoss Man commented on SOLR-9959:
--------------------------------
I don't understand this comment...
bq. ... This makes it impossible to ask the API for all metrics for a
component, which is what the /admin/mbeans used to provide. On the other hand,
the /admin/metrics API provides a much more structured view of related metrics
even if they come from different components.
Why is this impossible?
I see that on the jira/solr-9959 branch, using {{/admin/mbeans?stats=true}}
(and the Plugin/Stats screen in the UI) is useless (the stats param is now
completely ignored) but meanwhile if I go to {{/admin/metrics}} I can "ask" for
all of the stats for all components (or filter the list a variety of ways) ...
so why can't {{/admin/mbeans}} use the same code {{/admin/metrics}} to
loop/filter over the reported metrics for each "mbean" ?
Specifically...
* Since I (as a user) can do these:
**
http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE
**
http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE.searcher.queryResultCache
* Why can't I get the same info from these:
** http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE
**
http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE&key=queryResultCache
?
I don't mind breaking back compat (particularly for internal java APIs) when
needed to make overall improvements but in this case it seems like we're
breaking HTTP request/param level APIs (and the Admin UI) unneccessarily if the
underlying info is still accessible from {{/admin/metrics}}.
At worst it seems like maybe we could add a new
{{SolrMetricManager.SuffixFilter}} to let {{/admin/mbeans}} "search" for
metrics assocaited with the {{cat}} and {{key}} combo it's currently dealing
with as it loops over items?
But AFAICT an even cleaner/simpler solution would be:
* add a {{default MetricsMap getMetricsMap() {return null;} }} to
{{SolrInfoBean}} (to replace {{getStatistics()}})
* any class implementing both {{SolrInfoBean}} and {{SolrMetricProducer}}
_could_ implement {{initializeMetrics(...)}} such that it keeps a private
reference to a {{MetricsMap}} it registers & return that from
{{getMetricsMap()}}
** many of the {{SolrInfoBean}} classes already seem to be maintaining a
{{private MetricsMap metricsMap;}} that is assigned in
{{initializeMetrics(...)}} but never used (in the class) after that
* {{/admin/mbeans}} can call {{getMetricsMap()}} on each {{SolrInfoBean}} it
loops over if {{stats=true}}
(I realize even if we do this a lot of things available from {{/admin/metrics}}
still wouldn't be available from {{/admin/mbeans?stats=true}} -- but that's
totally reasonable. More significantly, this would -- IIUC -- let us ensure
everything *currently* available from {{/admin/mbeans?stats=true}} is *still*
available moving forward, minimizing breakage for existing consumers of
{{/admin/mbeans}}. If/when they want more advanced stuff, they can switch to
{{/admin/metrics}})
----
Other misc quesions/comments about the branch in no particular order...
* in general it would be helpful if this branch/jira included a text file
listing all the major release-note/ref-guide updates needed once this lands so
people can fully grasp whats changing from a users perspective:
{{/admin/mbeans}} vs {{/admin/metrics}}, enabling JMX, JMX config options no
longer supported (see below), etc...
* {{/admin/metrics}}
** Why doesn't {{/admin/metrics}} expose an param for the
{{SolrMetricManager.RegexFilter}} ?
*** Maybe it does and it's just not obvious?
** (When) Are we going to expose {{/admin/metrics}} via the Admin UI?
*** Even if we "fix" the problems w/ {{/admin/mbeans?stats=true}} mentioned
above (so that the Plugins/Stats screen starts working) having some sort of UI
screen exposting *all* the metrics now supported seems like a good idea.
*** If we can't fix the {{/admin/mbeans?stats=true}} the way i suggested above,
then adding a roughly equivilent UI screens using {{/admin/metrics}} seems like
it should be a blocker for landing this branch.
* This branch modifies the {{ivy.xml}} and IDE config files for
dataimporthandler to have a compile dependency on
{{io.dropwizard.metrics/metrics-core}} -- but that doesn't actaully seem to be
neccessary to compile/test DIH
** temp/abandondoned API refactorying that needs reverted?
* {{LFUCache}} and {{FastLRUCache}} still have comments that refer to the
(removed) {{getStatistics()}} method.
* I see this comment about {{fieldCache}} registration in SolrCore...{code}
SolrFieldCacheBean solrFieldCacheBean = new SolrFieldCacheBean();
// XXX this should be registered at the CONTAINER level because it's not
core-specific!
solrFieldCacheBean.initializeMetrics(metricManager,
coreMetricManager.getRegistryName(), null);
infoRegistry.put("fieldCache", solrFieldCacheBean);
{code}
** Since {{UninvertingReader}} is now a solr level class, and we're making
non-compat changes for 7.0 anyway, why don't we:
*** Register a "global" {{new SolrFieldCacheBean()}} at the container level as
the comment suggests we should.
*** Add a new verion of {{UninvertinterReader.getUninvertedStats(IndexReader)}}
that filters the {{CacheEntry[]}} based on the {{IndexReader.CacheKey}} of the
reader passed in ... _and any of it's leaf readers!_
**** Or ... I suppose technically we should recursively use
{{reader.getContext().getChildren()}} because there might be intermediate
wrappers? maybe?
*** use the new {{UninvertinterReader.getUninvertedStats(IndexReader)}} in a
{{new SolrFieldCacheBean(searcher.getIndexReader())}} registered at the
core/searcher level via {{CACHE.searcher.fieldCache}} (like all the other
searcher/reader related caches)
** Even if we want to punt the idea for a new
{{UninvertinterReader.getUninvertedStats(IndexReader)}} method to a new jira
and wory about it later, I don't see any reason why _this_ jira/branch
shouldn't go ahead and move the {{fieldCache}} metrics to be a the container
level (like the comment suggests)
*** for backcompat, we can still (for now) put a {{new SolrFieldCacheBean()}}
in the {{infoRegistry}} of every core ... assuming we _also_ fix
{{/admin/mbeans?stats=true}} as i suggested above, then those {{fieldCache}}
"stats" should still be available as they always have been {{/admin/mbean}}
users.
* MetricType
** this error (String) shouldn't be hardcoded, we should build it up from
{{EnumSet.allOf(MetricType.class)}}...{code}
} catch (IllegalArgumentException e) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Invalid
metric type in: " + types + " specified. Must be one of (all, meter, timer,
histogram, counter, gauge)", e);
}
{code}
** We could have a {{public static final String SUPPORTED_TYPES_MSG = ...}} in
{{MetricType}} to create this String-ified list only once on class load.
* JMX, in general, doesn't seem to be working as documented (in the ref guide)
for the examples on this branch
** ie: {{bin/solr -e techproducts -Dcom.sun.management.jmxremote}} + jconsole
shows no (solr) level metrics/stats.
** It appears this is because (on this branch) solr metrics are only exposed
via JMX if there is a {{SolrJmxReporter}} configured in {{solr.xml}} ?
*** if so this is a pretty big change, and definitely needs some docs/release
noting it.
*** It also raises some big questions:
**** Should {{solr/server/solr/solr.xml}} be updated to include
{{SolrJmxReporter}} so it works with the examples ?
**** At a minimum should it be updated to conditionally set a reporter based on
sysprops by {{bin/solr}} based on {{ENABLE_REMOTE_JMX_OPTS}} ?
** Personally, I think we should consider implicitly registering a
{{SolrJmxReporter}} based on the same criteria that are currently checked on
master when an empty {{<jmx/>}} is found:
{{null!=JmxUtil.findFirstMBeanServer()}}
*** that way if users configure a JMX MBeanServer via {{bin/solr}} or
{{solr.in.sh}} options, they'll automatically get Solr Metrics exposed via JMX
*** the only reason the _*very*_ legacy {{JmxMonitoredMap}} based code was
written to require {{<jmx/>}} before doing any of this was in case a user was
deplying Solr to an appserver along with many other apps, and you *only* wanted
JMX stats from your appserver or other apps -- but not Solr.
**** That ship has sailed and I don't think we need to worry about it.
**** if people start {{bin/solr}} with options to enable JMX, let's go ahead
and expose Solr Metrics via JMX
**** if, for some reason, some users really want jetty level JMX stats, but not
solr level JMX, then let's offer some type of option to _disable_ Solr level
JMX metric reporting ... perhaps a {{NoopSolrJmxReporter}} they can be
configure in {{solr.xml}} ?
* SolrConfig & JmxMonitoredMap
** with these changes, it's very clear that {{<jmx .../>}} configuration in
{{solrconfig.xml}} is now deliberately ignored, but...
*** nearly all of the sample (and many of the test) {{solrconfig.xml}} files on
this branch still include/document {{<jmx .../>}} and it's varous options. All
of that still needs cleaned up on this branch before merging to master.
*** at a minimum: if a {{<jmx/>}} XML node is found in a {{solrconfig.xml}}
file, Solr should log a WARN/ERROR w/message indicating that syntax is no
longer supported & being ignored (and give people a pointer to the new way to
configure JMX)
** how much of the existing config options that {{<jmx .../>}} supports
are/should be available with {{SolrJmxReporter}} in {{solr.xml}} ?
** the old style config supported *per-core* {{agentId}}, {{serviceUrl}}, or
{{rootName}} attributes...
*** {{rootName}} doesn't seem to be supported in {{SolrJmxReporter}}
**** this is probably fine.
**** that option dates back to deploying multiple solr.war files to a single
appserver, something we're not supporting anymore
*** the {{agentId}} option was created to address the possibility of multiple
JMX MBeanServers existing in the JVM on startup (possible because multiple apps
each launched their own) and this way you could deterministicly indicate which
one Solr should service it's stats from.
**** it looks like this is currently supported by {{SolrJmxReporter}}, but i'm
not sure if it needs to be? i guess it doesn't hurt?
*** how {{serviceUrl}} is handled is something I'm more on the concerned
about...
**** the use case motivation of supporting this attribute in the {{<jmx ...
/>}} config syntax was that the JVM might/might-not be configured to service a
platform level MBeanServer and serve jVM metrics, but each individual core
could individually run it's own MBeanServer (w/it's own port & security
options) -- to better isolate what stats diff JMX clients could see.
**** This {{serviceUrl}} feature never really evolved with SolrCloud, but
perhaps now is the time when it should?
**** As implemented in {{SolrJmxReporter}} this appears (AFAICT) to now only be
configurable at a "container wide" level (in {{solr.xml}})
**** It would be nice if there was a way to configure multiple JMX MBeanServer
instances (on each solr node) that only expose metrics from specific
collections (such that metrics from those collections would not be available
via the platform MBeanServer even if it was enabled via started by the command
line options).
***** Not sure if/where/how it would make sense to expose some configuration
like this, or even if it's worth pursuing (let alone pursuing right now) but I
wanted to point out the original point of {{serviceUrl}} in case anyone has any
ideas.
***** Could we perhaps enhance the {{SolrMetricReporter}} base class so that
all types of reporters could be configured with inclusion/exclusion rules based
on collection name?
* Testing JMX w/solr.xml changes...
** I manually modified {{solr/server/solr/solr.xml}} to include a
{{SolrJmxReporter}} to continue to test JMX support
** With this change, using {{bin/solr -e techproducts
-Dcom.sun.management.jmxremote}} + jconsole did start to expose a lot of
metrics via JMX -- but many of the solr realated MBeans exposed don't seem very
useful...
*** Notably: when I went to look at the {{CACHE}} related MBeans (ex:
filterCache), each one was exposing only a single "Attribute" named "Value"
(with MBeanAttributeValueInfo.Type==Object) which appears to contain the
toString (or perhaps an array of name=value strings?) of the various cache
stats (ie: lookups, cumulative_lookups, hitratio, etc...)
*** This is vastly different from master where this sort of toString info is
used as the MBean's _Description_ but each one of those individual "metrics"
are exposed as individual MBean _Attributes_ with the expected
MBeanAttributeValueInfo.Type (Long, Long, Float, etc...)
*** Even in the single "Value" Attribute that is exposed, some really useful
info currently available as MBean Attributes seems to be missing, notably:
"description" (from the toString()), and "name" (the class implementing the
cache)
*** Not being able to access the individual (strongly typed!) cache "stats" as
MBean Attributes seems like a major step backwards?
** perplexingly, w/ the modified {{solr.xml}} file, running {{bin/solr -e
techproducts}} (NOTE: no command line args to use JMX) still caused Solr to run
an MBeanServer that I could connect to with jconsole and use to view the same
solr metrics.
*** presumably this is because of the this code in {{SolrJmxReporter}} which
doesn't really make sense to me....{code}
ManagementFactory.getPlatformMBeanServer(); // Ensure at least one
MBeanServer is available.
{code}
*** Setting asside my earlier suggestions that we should consider changing our
"default" behavior when {{bin/solr}} / {{solr.in.sh}} is configured to run the
JVM with JMX enabled, this code as is doesn't follow the existing precendent of
{{JmxMonitoredMap}} to be a no-op unless:
**** an MBeanServer is started at the process/appserver level (ie: command line
args)
**** an agentId or serviceUrl is explicitly configured
*** In general it seems like the the _intended_ where/how/why to configure Solr
to publish metrics via JMX is now fairly confusing...
**** if the intent here is that moving forward users _must_ configure a
{{SolrJmxReporter}} in {{solr.xml}}, and they _must_ configure a {{serviceUrl}}
at that point to control access, then we definitely need to think through how
we're deprecating/replacing the various things {{bin/solr}} currently does with
{{ENABLE_REMOTE_JMX_OPTS=true}} (and if/how serviceUrl's can support
configuration of things like {{com.sun.management.jmxremote.local.only}},
etc... -- i have no idea if they can)
*** My 2cents: expanding on my suggestion above regarding solr's default JMX
behavior...
**** I think all calls to {{ManagementFactory.getPlatformMBeanServer()}} should
be removed from {{SolrJmxReporter}}, and that class should be a No-Op unless:
either agentId/serviceUrl is explicitly configured *OR* an already existing
MBeanServer is found (via {{JmxUtil.findFirstMBeanServer()}})
**** I think that if no metrics reporters (or perhaps specifically no
{{SolrJmxReporter}} instances?) are configured in {{solr.xml}}, CoreContainer
(or some equivilent) should check {{JmxUtil.findFirstMBeanServer()}} and
implicily register an {{SolrJmxReporter}} instance if the MBeanServer is
non-null.
**** that way command line JVM options (should be) the end all be decider as to
whether a Platform MBeanServer is launched, and if so: then Solr metrics be
exposed via JMX by default. If users want something more fancy they can
override with explicit {{SolrJmxReporter}} / {{SolrJmxNoopReporter}}
configurations
* other uses of {{ManagementFactory.getPlatformMBeanServer()}} ...
** I see now that as part of the earlier metrics work, {{SolrDispatchFilter}}
was also modified to call {{ManagementFactory.getPlatformMBeanServer()}} --
evidently so that the Metrics API can access JVM level MBeans?
*** This explains why (on both this branch and on master) running {{bin/solr}}
is all it takes to be able to connect to the process via jconsole -- even w/o
{{solr.xml}} changes or command line args requesting that the JVM enable JMX
services.
*** I don't think it's a good idea for Solr to be forcing the JVM to spin up
MBeanServer instances & accepting JMX connections, w/o the user
running/configuring solr explicitly requesting that -- particularly in the case
of this {{SolrDispatchFilter}} code which seems to run even if the user doesn't
care about metrics/JMX at all!
*** It seems like the {{BufferPoolMetricSet}} and {{OperatingSystemMetricSet}}
classes used in {{SolrDispatchFilter}} should probably be using the various
{{*MXBean}} impls available directly from {{ManagementFactory}} w/o needing an
MBeanServer to be running
**** see examples of doing this in {{SystemInfoHandler}}
*** but if an {{MBeanServer}} is really the only way to bridge these type of
OS/JVM level info into the metrics code then at a minimum we should change
{{SolrDispatchFilter}} to also use {{JmxUtil.findFirstMBeanServer()}} and if
the {{MBeanServer}} returned is null skip registering these metrics (in their
place: log an info message, and/or register a simple String constnat metric,
that certain metrics are only available if JMX is enabled)
** In general, I think it's a really bad idea for any (non-test) Solr level
code to be calling {{ManagementFactory.getPlatformMBeanServer()}} ... we should
probably consider marking it as a forbidden-api!
* I'm -1 on removing {{TestJmxIntegration}}
** if we need to change the JMX ObjectNames we look for to match the new
metrics based code, and/or change the test init to ensure {{SolrJmxReporter}}
is used by the CoreContainer (depending on what decisions are made about
default behavior) -- then so be it.
** but it seems really important to have a simple test like this sanity
checking some simple Solr metrics via a JMX {{MBeanServer}}
*** Yes i see {{CollectionsAPIDistributedZkTest}} and {{SolrJmxReporterTest}}...
**** {{CollectionsAPIDistributedZkTest}} is trusting the JMX stats to check
that instanceDirs don't collide, not doing anything to ensure that JMX is
reutrning expected values. (Independent of the current Jira, this test should
almost certaily be re-written to check this some other way)
**** {{SolrJmxReporterTest}} creates some random metrics -- that's not really a
good _integration_ test of the metrics/JMX code to check that external clients
can find specific Solr metrics like Searcher's "numDocs" (or filterCache's
"lookups"), or that the values of those metrics are updated when expected (and
to the expected value) based on actions that happen in solr (ie: docs added to
the index, queries executed with an fq, etc...)
* RequestHandlersTest
** testStatistics has been removed, but why isn't there a new equivilent
replacement test/asserts using some internal metrics API?
* CollectionsAPIDistributedZkTest
** Note this change...{noformat}
} catch (Exception e) {
+ log.info(e.toString());
// ignore, just continue - probably a "category" or "source"
attribute
// not found
}
{noformat}
** if this comment is still valid: then logging the exception here seems like
shot term debuging code that should not be committed
** if this comment is no longer valid based on the other changes in the loop
above this, then the comment should be deleted
** if the log message is still useful, then it should log the {{Exception}}
(w/stack trace), not the {{toString()}}
* MetricUtils
** all of these (newly) public methods should have javadocs explaining their
function / intended purpose
* SolrCore
** using {{\_notset\_}} and {{\_auto\_}} as special values here for the
collection & shardId metrics seems like a really bad idea. In both cases...
*** if we're not in cloud mode, then it seems like we shouldn't report a
collection name or a shardId as part of the metrics at all
*** if we're in cloud mode, then isn't it an error if either collection name or
shardId are null?
**** even if there is some situation i'm not thinking of where these might be
legitimately null (in cloud mode), can't/shouldn't we just return {{NULL}} for
these metric instead of trying to use magic string values that client code
might missinterpret?
**** if {{NULL}} isn't an option, then throwing an exception seems better then
using magic strings that might be missinterpreted/missused by client code
**** IE: I'd rather some automated client code get a failure trying to access
the metric named "collection" then get a value that the client code might try
to use down the line to query for a collection and gets a weird 404 error later.
> SolrInfoMBean-s category and hierarchy cleanup
> ----------------------------------------------
>
> Key: SOLR-9959
> URL: https://issues.apache.org/jira/browse/SOLR-9959
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: metrics
> Affects Versions: master (7.0)
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: SOLR-9959.patch
>
>
> SOLR-9947 changed categories of some of {{SolrInfoMBean-s}}, and it also
> added an alternative view in JMX, similar to the one produced by
> {{SolrJmxReporter}}.
> Some changes were left out from that issue because they would break the
> back-compatibility in 6.x, but they should be done before 7.0:
> * remove the old JMX view of {{SolrInfoMBean}}-s and improve the new one so
> that it's more readable and useful.
> * in many cases {{SolrInfoMBean.getName()}} just returns a FQCN, but it could
> be more informative - eg. for highlighter or query plugins this could be the
> symbolic name of a plugin that users know and use in configuration.
> * top-level categories need more thought. On one hand it's best to minimize
> their number, on the other hand they need to meaningfully represent the
> functionality of components that use them. SOLR-9947 made some cosmetic
> changes, but more discussion is necessary (eg. QUERY vs. SEARCHHANDLER)
> * we should consider removing some of the methods in {{SolrInfoMBean}} that
> usually don't return any useful information, eg. {{getDocs}}, {{getSource()}}
> and {{getVersion()}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]