Part of the problem is that there are four ponies here not one.
- Historical monitoring: Gathering statistics via SNMP or similar, storing them, and drawing pretty graphs. - Real-time monitoring: ping and other "is it up/down?" queries. These two things are so different that I rarely see software that can do both very well. Real-time should keep the last n-minutes of results in RAM for fast calculations. Historical monitoring should stash things on disk and move on. There are at least two more components: - Alerting: Say you know something is "wrong", the alerting system has to decide who to contact (based on a pager rotation schedule, etc.) and how to contact them (email or pager depending on ToD, urgency, and so on), and implements the escalation policy. - Graphing/dashboard: The system that draws the dashboards and pretty graphs mentioned above. It would be nice if we had well-defined interfaces between these components so that we could mix and match. Tom P.S. Has anyone tried http://opentsdb.net/ ? It looks very interesting.
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
