On 2/9/10 1:09 PM, "Justin Lloyd" <jll...@digitalglobe.com> wrote: > Has anyone done any investigation into having a monitoring tool like > Zenoss (which we use), Nagios, or OpenNMS watch for repairs? At the very > least, centralizing at least some of Cfengine hosts' logs and using a > log-watching tool like Swatch or Splunk would be a step in the right > direction. > > Team Cfengine: Is there any kind of roadmap for integration with such > third-party monitoring tools?
We ended up incorporating sensible checks into our local nagios instance. "Sensible" was determined by poking around the working directory on healthy and sick hosts. Each host currently watches the state of various files beneath /var/cfengine and alerts if they grow stale, cfagent exits non-zero, etc. This lets us know when cfengine is not running at all, along with a few edge cases like corrupt bdbs, and escalate/email/page. In general, we've adopted the mantra "Email is not for monitoring." We all get lots of mail already -- it's unreliable, there are quotas, lack of trending/reporting, no way to handle exceptions (schedule downtime), etc. Ditching email in favor of a real monitoring system will certainly make your life easier in the long run. You can still send informational emails as needed, but it's good to consider anything sent via email as an optional read and anything routed through monitoring as requiring action. Unfortunately, we have not yet implemented anything providing the per-promise granularity you are looking for. I wish we had it today! In the past I have thought that if policies syslog everything over TCP (requires some effort to do it right, syslog-ng or stunnel could work), to central splunk servers (or whatever)... It would certainly be possible to have a meaningful history of "everything" -- but building filters to extract the useful data would take time. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine