Good day,

I have two topics I would like to gather opinions on from the Ganglia world.

The first is that I have been looking at several different aspects of being 
able to receive alerts when a node goes down. At the moment I have just 
MacGyver'd a solution, based off a script Richard gave me. It just sends an 
email when a node stops reporting. I am still working out issues in it and I 
would like something a little more detailed. Looking through the archive, I 
noticed a few discussions on nagios plug-ins but from what I have read I 
understand that it is a completely different ballgame. I would like to ask the 
ganglia group what program they use to send alerts and if they would share 
their experiences on system alerts.

On another note.
After monitoring for a while, one thing that Ganglia has brought to my 
attention is that a few of the servers were WAY to heavily loaded and never 
left the red while others really didn't seem to do anything and rarely left the 
blue. Now I am in the middle of off loading the work onto the least used 
systems and would like to include the data in my reports. Basically what I am 
after is that I would like to have a report at the end of the week that tells 
me ComputerA was under heavy load 90% of the time, ComputerB did jack squat 
this past week, and ComputerC maintained a 50-80% work load this past week. 
Ganglia is great to eyeball the situation and do quick estimates of 
load-balancing but I would like view some raw data as well as the graphs.

I am trying to write a script that pulls the info from netcat and averages out 
some numbers but I believe that there is a easier way. Does ganglia store data 
in such a way that I could pull this type of information? This appears so 
useful to me that I am sure that there are others that have tried this, are 
there any ideas and suggestions?

Any comments are welcome.
Thanks everyone!
Chris Stackpole

Reply via email to