Tom Limoncelli wrote: > Part of the problem is that there are four ponies here not one. >
four is an optimistic count.. > > - Historical monitoring: Gathering statistics via SNMP or similar, > storing them, and drawing pretty graphs. > - Real-time monitoring: ping and other "is it up/down?" queries. > > These two things are so different that I rarely see software that can do > both very well. > But they aren't really. stats/metrics is what should drive the real-time monitoring, it's far more interesting and rich than simple boolean checks (although there are always some of these as well). Now, I agree that the real-time analysis of these metrics and long term storage are two different things. > Real-time should keep the last n-minutes of results in RAM for > fast calculations. Historical monitoring should stash things on disk and > move on. > So, may be you agree and overstated how different these really are. Real-time monitoring also needs to include a capacity management component, so you can tell before the end of a cycle (whatever that is for you) whether you're going to run out of capacity. > There are at least two more components: > > - Alerting: Say you know something is "wrong", the alerting system has > to decide who to contact (based on a pager rotation schedule, etc.) and how > to contact them (email or pager depending on ToD, urgency, and so on), and > implements the escalation policy. > > woa woa woa.. That's just one of many ways of doing "alerting", dashboards are commonly used. But really, everyone's missed the most important thing about monitoring. Say you have got the system of your dreams setup and working. That's great, but useless. What matters most is the workflow surrounding alerting/alerts, how you deal with alerts that are actionable, those that aren't, the ones that are long term issues, etc etc.. Integration with outage tracking systems, ticketing systems, hardware & software provisioning, software development processes and so on matters a whole lot more when it comes to the long term success of a monitoring implementation. > > - Graphing/dashboard: The system that draws the dashboards and pretty > graphs mentioned above. > > That's nice, but only useful in a manual way of doing things. What you want in addition are things like - trending analysis, and reporting - rich logging infrastructure (not for real-time monitoring/alerting, but as supplement) - lots of people are fond of (re)active monitoring, where certain alerts trigger automated actions. (yikes! i've yet to see a good use case for this.) It would be nice if we had well-defined interfaces between these components > so that we could mix and match. > It sounds nice, but my guess is that such an approach would be overly complex and ultimately doomed. There are some hard design choices which I suspect need to be made early based on key requirements which will vary from one environment to another. And that hardest of it all, will whatever you build scale? Finally, to all those clamoring for a dream solution.. The hardest part isn't the monitoring infrastructure itself, but integrating it in your environment. That is extremely time-consuming/expensive and requires a huge commitment from the business.
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
