[ 
https://issues.apache.org/jira/browse/SOLR-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956053#comment-15956053
 ] 

Hoss Man commented on SOLR-9959:
--------------------------------


(NOTE: comments below are in a mishmash order as i jumped around the code, so 
they can be somewhat redundent as i re-thought about diff concepts while 
looking at diff classes)

* SolrJmxReporter
** should these really be warnings, or just info?
*** {{log.warn("No serviceUrl or agentId was configured, using first 
MBeanServer.", mBeanServer);}}
*** {{log.warn("No JMX server found. Not exposing Solr metrics via JMX.");}}
*** ....in the prior code, warning might have made sense -- but in the new 
code,  seems like it should only be a warning if agentId is specified, or if 
serviceUrl can't be created?
** it didn't occur to me the last time i reviewed this code, but if it's 
possible for people to configure multiple SolrJmxReporter instances in 
solr.xml, then we should almost certainly support {{rootName}} like 
JmxMonitorMap did, otherwise won't the multiple SolrJmxReporter instances 
overwrite eachother if they use the same MBeanServer?
*** see related comments below regarding JmxObjectNameFactory
** as for *why* a person might want to configure multiple SolrJmxReporter 
instances, that goes back to my previous question about if/why we should 
support {{serviceUrl}} in SolrJmxReporter...
*** since SolrJmxReporter is now at the container level, the only value i see 
is if there's a way to configure Reporters to filter which collections they 
expose
*** so people might configure multiple SolrJmxReporter instances w/diff 
serviceUrls that expose the metrics for diff solr collections to diff end 
consumers
*** is this currently possible?
** NOTE: there is some sort of bug -- i didn't trace down the root cause -- 
causing multiple SolrJmxReporter instances to be inited on startup,
*** run {{bin/solr -e techproducts -Dcom.sun.management.jmxremote}} and very 
early in the logs you'll see...{noformat}
INFO  - 2017-04-04 22:46:40.787; [   ] org.apache.solr.core.SolrXmlConfig; 
Loading container configuration from 
/home/hossman/lucene/dev/solr/example/techproducts/solr/solr.xml
INFO  - 2017-04-04 22:46:40.833; [   ] org.apache.solr.core.SolrXmlConfig; 
MBean server found: com.sun.jmx.mbeanserver.JmxMBeanServer@66d3c617, but no JMX 
reporters were configured - adding default JMX reporter.
WARN  - 2017-04-04 22:46:41.252; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; No serviceUrl or agentId was 
configured, using first MBeanServer.
INFO  - 2017-04-04 22:46:41.269; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; JMX monitoring enabled at 
server: com.sun.jmx.mbeanserver.JmxMBeanServer@66d3c617
WARN  - 2017-04-04 22:46:41.269; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; No serviceUrl or agentId was 
configured, using first MBeanServer.
INFO  - 2017-04-04 22:46:41.270; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; JMX monitoring enabled at 
server: com.sun.jmx.mbeanserver.JmxMBeanServer@66d3c617
WARN  - 2017-04-04 22:46:41.270; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; No serviceUrl or agentId was 
configured, using first MBeanServer.
INFO  - 2017-04-04 22:46:41.276; [   ] 
org.apache.solr.metrics.reporters.SolrJmxReporter; JMX monitoring enabled at 
server: com.sun.jmx.mbeanserver.JmxMBeanServer@66d3c617
{noformat}
*** and later in the logs, once the techproducts core is added...{noformat}
WARN  - 2017-04-04 22:46:43.608; [   x:techproducts] 
org.apache.solr.metrics.reporters.SolrJmxReporter; No serviceUrl or agentId was 
configured, using first MBeanServer.
INFO  - 2017-04-04 22:46:43.609; [   x:techproducts] 
org.apache.solr.metrics.reporters.SolrJmxReporter; JMX monitoring enabled at 
server: com.sun.jmx.mbeanserver.JmxMBeanServer@66d3c617
{noformat}
*** isn't there only suppose to be *ONE* (implicit) SolrJmxReporter ? ... and 
why would a/each new core cause a new SolrJmxReporter to be created/init'ed ?


* JmxObjectNameFactory
** NOTE: i realize some of these comments aren't specific to changes on this 
branch, but i noticed them while reviewing the JMX stuff a bit more...
** I'm confused as to why the "reporter" name is being include in all the 
ObjectNames ?
*** it's going to be the same for every bean reported by the (same) 
SolrJmxReporter (so pracitcally speaking with teh default implicit 
SolrJmxReporter instance there's this weird "default" somewhere in the 
hierarchical drill down of every MBean)
*** if we expect there to be multiple SolrJmxReporter instances (thus needing 
to disambiguate the beans), then that's exactly what the point of {{rootName}} 
was inthe existing code -- and giving each reporter it's own prefix/hierarchy 
in the MBean server seems better then having their beans intermixed and needing 
to look for the "reporter" attribute of the name to disambiguate
**** so i would suggest either we should add the {{rootName}} option to 
SolrJmxReporter, _OR_ perhaps automatically use the (sanitized) reporter name 
as the _prefix_ of all the MBean names (in place of "solr:" in the current 
names (although at first glance i'm not even clear on where that's currently 
coming from ... i don't see domain being set?)
**** either way, the if/when we add the implicit default SolrJmxReporter it 
should still use an effective rootName/prefix/domain/reportName/whatever of 
"solr" in it's ObjectNames
*** if we don't expect there to be multiple SolrJmxReporter instances, then it 
seems silly to bother putting "reporter" in the name at all
** likewise: i don't understand why SolrJmxReporter is passing 
{{this.hashCode()}} as an "instance" property to include in all ofthe 
ObjectNames
*** (assuming we fix the bug mentioned above about multiple SolrJmxReporter 
instances being init'ed automatically on startup) all MBeans registered by the 
same JmxObjectNameFactory will have the same "instance" value -- so there 
doesn't seem to be any need for it in the name
**** and even worse then the "reporter" name, this is something that will 
change every time the node is restarted, making it impossible for JMX clients 
to programatically access (expected) objects at "known" names
*** the commit when this was added is 4249c8: It's not 100% clear but I suspect 
the reason relates to the part of the commit msg that mentions "multiple core 
containers in a JVM"
**** for *end users* that doesn't seem like a use case we should really worry 
much about since we're not viewing solr as a "WAR" but a standalone app.
***** if we did want to worry about this use case, then that's exactly what 
setting the {{rootName}} on JmxMonitoredMap enabled users to do in the past to 
avoid collisions -- and we could likewise support that on SolrJmxReporter (or 
by using the reporter name as the prefix as suggested above) and the burden is 
on these advanced users to configure the SolrJmxReporter's in their multiple 
CoreContainer instances with unique names
**** if the concern is about our (cloud) *test* code (and having unique 
ObjectNames even when we're spinning up multiple jetty+CoreContainer instance 
in a single JVM process) then it seems like we should really solve that in our 
test code/scaffolding and not "pollute" the JMX MBean names usres see in 
production with numeric values that unique every time hte process is restarted.
***** first idea that comes to mind: the {{solr.xml}} files used in cloud tests 
(where we want to do JMX testing) can explicitly declare a SolrJmxReporter 
(rather then getting the implicit default) using a 
rootName/prefix/domain/reportName/whatever that includes{{"\${hostPort\}"}} ?
***** Cloud tests that aren't explicitly planning on checking JMX metrics (ie: 
don't force the creation of the platform MBeanServer before creating the 
MiniSolrCluster) don't need to worry about it



* OperatingSystemMetricSet + MetricsUtils.addMXBeanMetrics (vs. 
SystemInfoHandler)
** is there anyway we can refactor this so we're not copy/pasting so much code 
from SystemInfoHandler?
*** maybe a {{doOperatingSystemMXBeanIterator}} utility method somewhere that 
takes in a lambda to run for each bean property?
** at a minimum: we should staticly declare the list of class names we try once 
and re-use in both places -- so if we ever add more (new JVMs, OpenJDK renames 
the com.sun.* MXBeans, etc..) we don't overlook one list.


* JvmMetricsTest
** won't some of these tests fail on some (non-oracle) JVMs where those 
platform specific OS metrics may not be available?
** even if we hardcode a list of expected metrics, the test should probably 
group them based on the MXBean they are expected from 
({{java.**.OperatingSystemMXBean}} vs {{com.sun.**.OperatingSystemMXBean}} vs 
{{com.sun.**.UnixOperatingSystemMXBean}} etc...) and only check them (against 
the solr metrics API) if that source MXBean class can be found in the current 
classpath


* SolrInfoBean
** {{getMetrics()}} might be a missleadingly simplistic name for what this 
method does?
*** perhaps something like {{getMetricsSnapshot()}} (since it's a snapshot of 
the current values)
*** or {{getSimplifiedMetricsMap()}} ?
*** or maybe {{getMetricsAsSimpleStats()}} since that's the ultimate end 
purpose (ie: describe the usage, not the internals)
** if {{getMetricRegistry()}} is going to {{default}} to {{return null}}, then 
shouldn't {{getMetricNames()}} also {{default}} to {{return null;}} ?
** in general can we please beef up the javadocs of {{getMetricNames}} & 
{{registerMetricName}} to make their relationship and intended use more obvious 
(that was the most confusing part for me of reading through the recent 
changes)...
*** when/who calls {{registerMetricName}} (IIUC: metrics manager via callback 
when a SolrInfoBean register a new metric)
*** when/who calls {{getMetricNames}} & why (IIUC: by default, 
{{registerMetricName}} calls method to populate the map when new metrics are 
registered, and by default {{getMetrics()}} calls to read from the map to 
convert metrics to their simplified form)
** we may also want to make it clear in these javadocs that SolrInfoBeans 
can/should override these various methods to prevent some/all metric names from 
being added to this map if they don't want them exposed as part of the 
"SolrInfoBean stats" (ie: some plugins might want to only expose a handful a 
simple numeric/string stats via /admin/mbeans because of the limited API, but 
via the metrics reporting they report a lot more robust things.


* {{/admin/mbeans}}
** there's still a small disconnect between the expected/existing _structure_ 
of what {{/admin/mbeans}} returns and the code currently on the branch (which 
breaks the admin UI) ... there's an extra "map" with a single key (which looks 
like the global metric name?), whose value is a map with the actual stats as 
*** master: 
http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&key=queryResultCache&indent=true&wt=json&cat=CACHE
 {noformat}
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "solr-mbeans":[
    "CACHE",{
      "queryResultCache":{
        "class":"org.apache.solr.search.LRUCache",
        "version":"1.0",
        "description":"LRU Cache(maxSize=512, initialSize=512)",
        "src":null,
->      "stats":{
          "lookups":1,
          "hits":0,
          "hitratio":0.0,
          "inserts":1,
          "evictions":0,
          "size":1,
          "warmupTime":0,
          "cumulative_lookups":1,
          "cumulative_hits":0,
          "cumulative_hitratio":0.0,
          "cumulative_inserts":1,
          "cumulative_evictions":0}}}]}
{noformat}
*** branch: 
http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&key=queryResultCache&indent=true&wt=json&cat=CACHE
 {noformat}
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "solr-mbeans":[
    "CACHE",{
      "queryResultCache":{
        "class":"org.apache.solr.search.LRUCache",
        "description":"LRU Cache(maxSize=512, initialSize=512)",
->      "stats":{"CACHE.searcher.queryResultCache":{              // NOTE: 
NESTED MAP
            "lookups":3,
            "hits":2,
            "cumulative_evictions":0,
            "size":1,
            "hitratio":0.67,
            "evictions":0,
            "cumulative_lookups":3,
            "cumulative_hitratio":0.67,
            "warmupTime":0,
            "inserts":1,
            "cumulative_inserts":1,
            "cumulative_hits":2}}}}]}
{noformat}
** I'm not entirely sure if this is a bug in {{SolrInfoMBeanHandler}} or 
{{SolrInfoBean.getMetrics()}} or {{MetricUtils.convertMetrics}}
*** I'm guessing maybe the expectation is that {{SolrInfoBean.getMetrics()}} 
should be unwinding the map returned by {{MetricUtils.convertMetrics}} ?
**** Because IIUC {{getMetrics()}} should know that based on it's usage of 
{{convertMetrics}} there should only be one key in the map returned?

* MBeansHandlerTest
** since we're still using {{/admin/mbeans}} in the UI, this test sshould 
prbably added back to the branch
** i suspect as is it probably would hav caught the map-in-a-map bug as part of 
it's "diff" checking
** we should also go ahead and beef it up to do a simple spot check of an 
explicit cache stat value to verify has the correct NamedList/Map structure for 
accessing individual stat values.

* MetricUtils
** even if these methods aren't public (anymore), it's helpful to have javadocs 
so future developers understand the intent of a method (and don't acidently 
missues/break them in the future)

* StatsReloadRaceTest
** this smells bad...{code}
    // this is not guaranteed to exist right away after core reload - there's a
    // small window between core load and before searcher metrics are registered
    if (metrics.get(key) != null) {
      assertTrue(metrics.get(key) instanceof Long);
    }
{code}...shouldn't we at least poll that {{!= null}} check in a loop to ensure 
*eventually* the metrics are available and have the correct type?  (otherwise 
the test won't be useful in the event that something breaks down the road)


* Disabling JMX for solr, but having it on for the JVM
** now that SolrXmlConfig implicitly adds a SolrJmxReporter if (at least) one 
is not declared, don't we need a {{NoopSolrJmxReporter extends 
SolrJmxReporter}} (or marker interface) option so people who only want JVM 
level JMX metrics don't have to deal with the overhead of Solr level metrics in 
JMX?
** that would also eliminate the need for the {{private int jmxReporter}} hack 
in {{SolrMetricsIntegrationTest}} & similar tests
*** just include a {{NoopSolrJmxReporter}} in the {{solr-*reporter.xml}} files
*** or maybe the tests are just simpler this way with the extra "+1" ? ... not 
sure how i feel.






> SolrInfoMBean-s category and hierarchy cleanup
> ----------------------------------------------
>
>                 Key: SOLR-9959
>                 URL: https://issues.apache.org/jira/browse/SOLR-9959
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: master (7.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-9959.patch, SOLR-9959.patch
>
>
> SOLR-9947 changed categories of some of {{SolrInfoMBean-s}}, and it also 
> added an alternative view in JMX, similar to the one produced by 
> {{SolrJmxReporter}}.
> Some changes were left out from that issue because they would break the 
> back-compatibility in 6.x, but they should be done before 7.0:
> * remove the old JMX view of {{SolrInfoMBean}}-s and improve the new one so 
> that it's more readable and useful.
> * in many cases {{SolrInfoMBean.getName()}} just returns a FQCN, but it could 
> be more informative - eg. for highlighter or query plugins this could be the 
> symbolic name of a plugin that users know and use in configuration.
> * top-level categories need more thought. On one hand it's best to minimize 
> their number, on the other hand they need to meaningfully represent the 
> functionality of components that use them. SOLR-9947 made some cosmetic 
> changes, but more discussion is necessary (eg. QUERY vs. SEARCHHANDLER)
> * we should consider removing some of the methods in {{SolrInfoMBean}} that 
> usually don't return any useful information, eg. {{getDocs}}, {{getSource()}} 
> and {{getVersion()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to