Part of the problem is that there are four ponies here not one.

   - Historical monitoring: Gathering statistics via SNMP or similar,
   storing them, and drawing pretty graphs.
   - Real-time monitoring: ping and other "is it up/down?" queries.

These two things are so different that I rarely see software that can do
both very well.  Real-time should keep the last n-minutes of results in RAM
for fast calculations.  Historical monitoring should stash things on disk
and move on.

There are at least two more components:

   - Alerting: Say you know something is "wrong", the alerting system has to
   decide who to contact (based on a pager rotation schedule, etc.) and how to
   contact them (email or pager depending on ToD, urgency, and so on), and
   implements the escalation policy.
   - Graphing/dashboard: The system that draws the dashboards and pretty
   graphs mentioned above.

It would be nice if we had well-defined interfaces between these components
so that we could mix and match.

Tom

P.S.  Has anyone tried http://opentsdb.net/ ?  It looks very interesting.
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to