Absolutely agreed, subsecond makes no sense and the ganglia design is not appropriate for that anyway. I was originally asked to do 5 seconds, but I have increased that to 10 seconds as there was no meaningful change in the shape of the graphs anyway.
But 10 second polling is useful to me for a subset of our estate that does monti-carlo pricing of instruments on demand for the traders. These calculations take 5-30 seconds and preserving the peak of this load spike helps us with cluster sizing. Most other HPC environments out there do job runs that are much longer - and for them a 5 second poll is silly. Martin, bug entered #110. Maybe a limit of a few seconds is appropriate, just to limit silly gmond.conf poll rates and ensure /prioc/blah is read once per entire group of metrics. - kind regards, richard -----Original Message----- From: Martin Knoblauch [mailto:[EMAIL PROTECTED] Sent: 11 August 2006 12:12 To: Grevis, Richard: IT (LDN); [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [email protected] Subject: Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia Correct. Below code limits the sampling rate for the cpu*, load*, mem* and net* graphs. Setting them to 0 will give you 1 second "accuracy". Or "nice furry graphs" as Richard said (actually the "furriness" is what the original authors wanted to prevent :-). Personally I doubt that sampling load* and mem* at that rate. cpu* and net* may make sense. Richard, yes please file a report. Unfortunatelly I spoke to soon when I mentioned that we should get rid of the intervalls at all. Reason is that we need to compute differences for the cpu* and net* metrics (they are rates after all). If we want to have sub-second sampling rates, we need to use "getimeofday" instead of "time". --- [EMAIL PROTECTED] wrote: > If you do want to do fast polling on the Linux or cygwin gmond, I > found some hardwired code in there which effectively limits the > polling rate > for > some metrics no matter what you put in the config files. (Sorry > martin, > have not raised a bug report yet). Anyway: > > the code below is in the cygwin and linux metric.c files. > > > > -------------------------------------------------------- > > typedef struct { > > uint32_t last_read; > > uint32_t thresh; > > char *name; > > char buffer[BUFFSIZE]; > > } timely_file; > > > > timely_file proc_stat = { 0, 15, "/proc/stat" }; > > timely_file proc_loadavg = { 0, 15, "/proc/loadavg" }; timely_file > > proc_meminfo = { 0, 30, "/proc/meminfo" }; timely_file proc_net_dev > > = { 0, 30, "/proc/net/dev" }; > > > > char *update_file(timely_file *tf) > > { > > int now,rval; > > now = time(0); > > if(now - tf->last_read > tf->thresh) { > > rval = slurpfile(tf->name, tf->buffer, BUFFSIZE); > > if(rval == SYNAPSE_FAILURE) { > > err_msg("update_file() got an error from slurpfile() reading > > %s", > > tf->name); > > return (char *)SYNAPSE_FAILURE; > > } > > else tf->last_read = now; > > } > > return tf->buffer; > > } > > -------------------------------------------------------- > > I have set those timeout values zero, which works well and gives me > nice spiky furry graphs. > > - richard ------------------------------------------------------ Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de ------------------------------------------------------------------------ For more information about Barclays Capital, please visit our web site at http://www.barcap.com. Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons. ------------------------------------------------------------------------

