So, once you've gotten Ganglia to pull in metrics from gazillions of nodes in umpteen clusters, and got pretty graphs of everything, what do you use for monitoring the values? I mean, when a machine goes down, you don't want just a webpage to be updated, you want something to trigger the klaxons.
I've tried to adapt Nagios (formerly known as Netsaint) for that purpose, but Nagios doesn't really fit the bill; it's designed to collect it's own monitoring data and is not very happy with just being fed data from other sources. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------

