Good day, I have two topics I would like to gather opinions on from the Ganglia world.
The first is that I have been looking at several different aspects of being able to receive alerts when a node goes down. At the moment I have just MacGyver'd a solution, based off a script Richard gave me. It just sends an email when a node stops reporting. I am still working out issues in it and I would like something a little more detailed. Looking through the archive, I noticed a few discussions on nagios plug-ins but from what I have read I understand that it is a completely different ballgame. I would like to ask the ganglia group what program they use to send alerts and if they would share their experiences on system alerts. On another note. After monitoring for a while, one thing that Ganglia has brought to my attention is that a few of the servers were WAY to heavily loaded and never left the red while others really didn't seem to do anything and rarely left the blue. Now I am in the middle of off loading the work onto the least used systems and would like to include the data in my reports. Basically what I am after is that I would like to have a report at the end of the week that tells me ComputerA was under heavy load 90% of the time, ComputerB did jack squat this past week, and ComputerC maintained a 50-80% work load this past week. Ganglia is great to eyeball the situation and do quick estimates of load-balancing but I would like view some raw data as well as the graphs. I am trying to write a script that pulls the info from netcat and averages out some numbers but I believe that there is a easier way. Does ganglia store data in such a way that I could pull this type of information? This appears so useful to me that I am sure that there are others that have tried this, are there any ideas and suggestions? Any comments are welcome. Thanks everyone! Chris Stackpole

