Without actually answering your question, I'm curious about this part: "The task, in short, is the implementation of real-time-monitoring, i.e. a timeframe of about 30 minutes to 3 hours overall. gmond should collect the data every 1 or 2 seconds, so state changes can be immediately recognized and be dealt with."
It sounds like by changes you mean something that should be classified as 'not good' and potentially actionable, like a server running out of connections, or memory being swapped, or some other sort of badness which should be alerted on. My question is: is the effort to make ganglia report metrics "real-time" (i.e. every 1 or 2 seconds) worth it ? While of course every environment is different, and sensitivity in each environment to change or failure is also different, in my experience, there is past data available (with ganglia, especially) in sufficient enough resolution (even every minute) that you can predict when bad things are going to happen before they actually do. This releases you from having to act on a failure or change with no time to spare. Nagios and other alerting monitors handle failure scenarios, as I'm sure you know, like this. "Warning" thresholds and "critical" thresholds are set to make sure that you have time enough to react without it being an absolute emergency. -john allspaw ----- Original Message ---- From: Thomas Spanner <[EMAIL PROTECTED]> To: [email protected] Sent: Saturday, October 27, 2007 7:59:57 AM Subject: [Ganglia-developers] real-time-monitoring Hi all! Last month a friend of mine and I started a project at our university (TU Munich) involving your nice little software. This should be our last certificate, so we can finally begin our final exams. The task, in short, is the implementation of real-time-monitoring, i.e. a timeframe of about 30 minutes to 3 hours overall. gmond should collect the data every 1 or 2 seconds, so state changes can be immediately recognized and be dealt with. So far, we have been studying the code and the rrdtool, and hopefully understood how this things work. Before we play around with the variables, I'd like to ask your opinion: 1. can this be accomplished? My concern is that the overhead gets too much, and ganglia slows the whole cluster down (there must be a reason why the step variable of the gmetad is 15 seconds as default.) 2. is it sufficient to lower the „collect_every“ and „time_threshold“ variables in gmond.conf to speed up the data collection or isn't it that simple? I want the the gmond write its output to the console to see what the collector does. So I change a value, stop and restart with „gmond -d9 start“ but then get: [...] tcp_listen() on xml_port failed: Address already in use How do I get the output without restarting the computer? (when I start gmond for the first time it works). I think I tried to restart gmetad, too, but problem remains. Your help is greatly appreciated, thanks, Tom and Percy -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
