On Mon, 13 Jan 2014, Matthew Barr wrote:

So, i’ve recently been reading up on the #monitoringsucks tags, their 
responses, and some of the various things that have come out of it.
I’m in a new shop, AWS based, so may of the old standbys aren’t quite as much 
of a obvious call anymore.

What I’m now trying to figure out is what I’m missing, or would lose, by going 
with a newer paradigm for monitoring.


Anyone using Riemann yet?   Do you still use nagios / sensu / etc?

— Basically, Riemann operates on a stream of metrics, vs relying on a a check 
every X min.

I’m trying to determine what I’ve lost by not implementing a nagios style system, 
to basically cron checks.   (the alerting & state stuff I’m pretty confidant 
I’m not loosing.)


For example: I had initially thought I’d lose a check of the web site every X 
min, but the load balancer does that anyways, and that triggers log and metrics 
about page speed return.

I think that as you scale, you start getting even more data & metrics, and the 
need for manual injection of jobs becomes smaller.


I’m curious about peoples thoughts on this…

You can eliminate a lot of active checks if you watch the logs for normal activity (you can even setup your alerts so instead of just calling a person, it first does a monitoring probe in case the traffic had just dropped off)

One thing to remember, your load balancer's test is not testing to see if the product works, just that the webserver works. you need other tests to make sure that all the web hits you are getting aren't just generating a 'database error, try again later' response ;-)

David Lang
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to