Recommendations for health monitoring

Joel Potischman Mon, 23 Mar 2015 08:12:00 -0700

We currently monitor our app by having a monitoring tool (Pingdom) retrieve 
a health page from our app that retrieves and displays the Elasticsearch 
cluster info, e.g.

{
"status": 200,
"name": "whatever",
"cluster_name": "whatever_dev",
"version": {
"number": "1.4.4",
"build_hash": "c38f773fc81201d1abdfde1ca2746fab58efa912",
"build_timestamp": "2015-02-19T13:05:36Z",
"build_snapshot": false,
"lucene_version": "4.10.3"
},
"tagline": "You Know, for Search"
}

If the monitoring process can't reach our app, or our app can't reach
Elasticsearch, we'll get an error and an alert, however, this doesn't tell
us anything about node and index health. I've made a page that calls
ClusterClient.health(level='indices') but want to confirm

1. Is this sufficient for surfacing any issue with our Elasticsearch
infrastructure? and
2. Does this call block query requests/backups, consume a lot of
resources, or otherwise create impacts such that we wouldn't want to be
calling it every 60 seconds 24x7?

We don't need to have our monitoring page give us a full diagnosis of all
conceivable issues, we just need it to trigger an alert that there *is* an
issue so we know we have some work to do, while having minimal impact on
overall application performance.

Any recommendations on what we should monitor to achieve those two mandates
would be greatly appreciated.

Thanks,

-joel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d9290f69-5150-4824-9ef4-6011b35ed959%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Recommendations for health monitoring

Reply via email to