[
https://issues.apache.org/jira/browse/SOLR-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637520#comment-15637520
]
Shawn Heisey commented on SOLR-9731:
------------------------------------
Sounds like a really good idea. Any indicators of health that aren't horribly
disruptive should be tracked and made available. Codahale metrics is already a
dependency, and it can give us percentiles on any stats where they make sense.
Thinking out loud (and this might be per-core, not JVM-wide, but I don't have
anywhere else to discuss it right now):
I wonder if there's any way to detect when and how much actual disk I/O is
required to satisfy a query. I suspect that this information is not readily
available to Java, and even if it its, that it would need to be tracked down in
the Lucene layer and made available via public getters that Solr could query.
Lucene *might* be able to track statistics about how many nanoseconds it takes
for reading X bytes from MMap, and that information could ultimately be
interpreted by a user to indicate whether or not their disk caching is
effective. One problem with that idea: Lucene's core functionality has no
dependencies, so that feature would probably have to be written using native
classes/methods included with the JVM, not an external dependency like the
metrics package. It would be really awesome if we could see median and
percentile info about how long the MMap accesses are taking. We'd be able to
use that info to determine whether a performance issue is due to insufficient
disk cache.
> Add jvm-wide JMX statistics for Solr
> ------------------------------------
>
> Key: SOLR-9731
> URL: https://issues.apache.org/jira/browse/SOLR-9731
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Erick Erickson
>
> The statistics that can currently be gathered via JMX tend to be
> core-specific, making monitoring "how is the Solr node doing" harder than it
> needs to be. This JIRA is about exploring what it would take for
> instance-wide statistics to be JMX-enabled.
> I'm imagining cumulative stats like:
> > How many Solr<->Solr communications errors have there been?
> > How many Solr<->ZK communication errors have there been
> > How many full synchronizations have happened across all replicas?
> > Operations people, fill in your favorite health monitoring bit here.
> What do people think? Is JMX even the right thing? We have an admin end-point
> for gathering information, but that's not as "operations friendly".
> I'm open to any suggestions for how/where to implement this, whether there
> are any huge "gotchas", bottleneck concerns, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]