[jira] [Commented] (SOLR-9731) Add jvm-wide JMX statistics for Solr

Shawn Heisey (JIRA) Fri, 04 Nov 2016 13:05:25 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637520#comment-15637520
 ]


Shawn Heisey commented on SOLR-9731:
------------------------------------

Sounds like a really good idea.  Any indicators of health that aren't horribly 
disruptive should be tracked and made available. Codahale metrics is already a 
dependency, and it can give us percentiles on any stats where they make sense.

Thinking out loud (and this might be per-core, not JVM-wide, but I don't have 
anywhere else to discuss it right now):

I wonder if there's any way to detect when and how much actual disk I/O is 
required to satisfy a query.  I suspect that this information is not readily 
available to Java, and even if it its, that it would need to be tracked down in 
the Lucene layer and made available via public getters that Solr could query.

Lucene *might* be able to track statistics about how many nanoseconds it takes 
for reading X bytes from MMap, and that information could ultimately be 
interpreted by a user to indicate whether or not their disk caching is 
effective.  One problem with that idea: Lucene's core functionality has no 
dependencies, so that feature would probably have to be written using native 
classes/methods included with the JVM, not an external dependency like the 
metrics package.  It would be really awesome if we could see median and 
percentile info about how long the MMap accesses are taking.  We'd be able to 
use that info to determine whether a performance issue is due to insufficient 
disk cache.


> Add jvm-wide JMX statistics for Solr
> ------------------------------------
>
>                 Key: SOLR-9731
>                 URL: https://issues.apache.org/jira/browse/SOLR-9731
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>
> The statistics that can currently be gathered via JMX tend to be 
> core-specific, making monitoring "how is the Solr node doing" harder than it 
> needs to be. This JIRA is about exploring what it would take for 
> instance-wide statistics to be JMX-enabled.
> I'm imagining cumulative stats like:
> > How many Solr<->Solr communications errors have there been?
> > How many Solr<->ZK communication errors have there been
> > How many full synchronizations have happened across all replicas?
> > Operations people, fill in your favorite health monitoring bit here.
> What do people think? Is JMX even the right thing? We have an admin end-point 
> for gathering information, but that's not as "operations friendly".
> I'm open to any suggestions for how/where to implement this, whether there 
> are any huge "gotchas", bottleneck concerns, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9731) Add jvm-wide JMX statistics for Solr

Reply via email to