Hi, I've been working on an HPC system recently (~600 nodes) which uses a combination of Ganglia and Nagios -- Ganglia for metrics gathering and Nagios for generating alerts (more or less). One of the constraints on the monitoring implementation was to ensure that any monitoring agents had as small a footprint as possible on the client systems, which is why Ganglia's gmond was chosen over the NRPE / Nagios plugin combination. We chose this implementation based on the availability of the ganglia-nagios plugin, but found that because of the way it is implemented it quickly choked the life out of the monitoring server (5500 probes over a 5 minute interval each of which downloads a full copy of the XML file from the GMond collector's accept channel is quite an overhead). The gmond collector was quickly overwhelmed and we started seeing gaps in the graphs presented by the WWW UI -- the RRD files were being starved.
So I had a quick look at pulling data directly from the RRD files instead -- I wrote a shell script that used rrdtool to extract the last entry from the DB file and present the result to Nagios. It made a huge difference to the performance of the monitoring server and restored Ganglia to its normal working condition. And Nagios can now generate alerts based on the information gathered by gmond. I've attached a copy of this script along with a python equivalent in the hope that it might prove useful to other cluster admins. It works well in our environment -- the shell script was running for about a month before we switched over to the python version (which has been in place for about 2 weeks). Regards, Malcolm.
check_ganglia_rrd.tgz
Description: GNU Zip compressed data
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev
_______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

