On Mon, Jan 13, 2014 at 5:56 PM, David Lang <[email protected]> wrote:
> On Mon, 13 Jan 2014, Matthew Barr wrote: > > So, i’ve recently been reading up on the #monitoringsucks tags, their >> responses, and some of the various things that have come out of it. >> I’m in a new shop, AWS based, so may of the old standbys aren’t quite as >> much of a obvious call anymore. >> >> What I’m now trying to figure out is what I’m missing, or would lose, by >> going with a newer paradigm for monitoring. >> >> >> Anyone using Riemann yet? Do you still use nagios / sensu / etc? >> >> — Basically, Riemann operates on a stream of metrics, vs relying on a a >> check every X min. >> >> I’m trying to determine what I’ve lost by not implementing a nagios style >> system, to basically cron checks. (the alerting & state stuff I’m pretty >> confidant I’m not loosing.) >> >> >> For example: I had initially thought I’d lose a check of the web site >> every X min, but the load balancer does that anyways, and that triggers log >> and metrics about page speed return. >> >> I think that as you scale, you start getting even more data & metrics, >> and the need for manual injection of jobs becomes smaller. >> >> >> I’m curious about peoples thoughts on this… >> > > You can eliminate a lot of active checks if you watch the logs for normal > activity (you can even setup your alerts so instead of just calling a > person, it first does a monitoring probe in case the traffic had just > dropped off) > > One thing to remember, your load balancer's test is not testing to see if > the product works, just that the webserver works. you need other tests to > make sure that all the web hits you are getting aren't just generating a > 'database error, try again later' response ;-) > > David Lang > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ > > What does everyone here use for (host) hardware monitoring? At $work we use a combination of host-side scripts that periodically run and parse the output of vendor-specific binaries and send alerts to our monitoring servers and we also run the vendor hardware agents which send snmp traps. There are shortcomings in both approaches and I'm currently splitting my time trying to improve both of them.
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
