This is a very good point, I'm thinking about this for years.

Node failures should be easy to monitor by OS services. But latency spikes
are totally different.

It is a very, very hard job to measure anomalies in latency correctly. Just
consider the skews of wrong programming, or of the hostile environments
JVMs do run in (clocks, OSes, VMs, ...) If anomalies are detected wrongly,
no or false alerts are emitted, and all of the effort would lead to
annoyance or frustration.

Lately I read about Gil Tene's LatencyUtils

https://github.com/LatencyUtils/LatencyUtils

https://groups.google.com/forum/#!topic/mechanical-sympathy/oZSv5QnpAYs

which I find a promising tool to measure anomalies in histograms.

Some of this might be possible to get implemented by an ES plugin, but I
haven't tried LatencyUtils yet, and how it can be connected to ES metrics
is still open to me.

Jörg


On Thu, Mar 6, 2014 at 7:24 PM, T Vinod Gupta <[email protected]> wrote:

> is there a plugin or api support for monitoring ES key metrics and
> alerting the dev ops about situations when some node in a cluster fails or
> there is a spike in latency due to whatever reason?
>
> what are the best practices here and what do people usually do?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGXNqJkF5uL2oCKmBsHYqQJxFdxUrW%2BF0maVSJupOGupQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to