Re: [Ganglia-developers] [pun using the word "Gap"]

Steven Wagner Fri, 28 Feb 2003 12:24:07 -0800

matt massie wrote:

steve-
your idea is right on. i'm actually embarrassed that i didn't think ofthat before.
man... if you only have 128 node cluster w/30 standard metrics.. that
means that every 15 seconds you are making 3840 gettimeofday() systemcalls. ouch!there is also another more subtle problem. the cluster timestamp in theXML was added to handle clusters scattered over different timezones. yourmethod is right on because it will ensure that the time line for metricsmatches the real time on the remote clusters (and not the time of themachine collecting the cluster data).
also... your method ensures the data values match exactly the time theyare valid... the old method is VERy sensitive to the time of parsing theXML and saving to the databases.
we need to wrap that change up into a release and get that out to theusers soon. actually, federico wrapped up a 2.5.2 release and i droppedthe ball on getting it the rest of the way out the door. i have beenswamped with meeting, conferences and working on ganglia 3 lately... ithink it might be smart if i focus back on getting ganglia 2.5.x cleanedup and ready for a good maintenance release.do you have a patch for the changes against the latest CVS source? howhas your fix been working for you lately?

I'd been meaning to do a follow-up on this. I'm having a problem with RRDsnot being written to over time.

At first I thought, "Oh, crap. I've tampered with rrd_tools.c and this isclearly a case of divine retribution."

So I swapped out gmetad binaries, and the same damn thing startedhappening, only this time I had real gappy data for about 40-45 minutes andthen it stopped entirely (while gmetad still runs).

What I should have done was check the timestamps to see whether the datasource threads were dying off entirely or whether it was just a problemupdating RRDs, but I didn't do that because it would have made sense andbeen beneficial from a troubleshooting standpoint and who needs that?

Other people have seemed to have similar problems with various versions ofgmetad doing this, if memory serves me correctly (*bites yellow bellpepper*). So this may be an unrelated bug. Plus, I just changed a fewthings from a filesystem point of view about the RRD files and am nowencountering such weirdness as having one entire data source (a two-nodetest cluster, whoopty-doo) with blank data. Yet the RRD files'last-modified timestamp's being updated every minute...

Running gmetad with debug output on for an hour with this manyhosts/metrics isn't really an option. That'd be a fricken' huge logfile.

It's worth noting that my production gmetad is 2.5.0, and mytimestamp-modified version is off the 2.5.2 release codebase. Next step ifthis keeps up is to put together a patch and apply it to 2.5.0 gmetad andsee if it exhibits the same behavior...

Although the only real difference in gmetad from 2.5.0 to 2.5.2 is theinclusion of Fed's grid logic, if I'm not mistaken (Changelog? what'sthat?). So I don't know if any of the recent gmetad modifications couldhave introduced such wackiness. It's probably just a condition that'sstarting to pop up in running gmetad with this many sources, hosts andmetrics on this system...


Bleh.

Re: [Ganglia-developers] [pun using the word "Gap"]

Reply via email to