steve- your idea is right on. i'm actually embarrassed that i didn't think of that before.
man... if you only have 128 node cluster w/30 standard metrics.. that means that every 15 seconds you are making 3840 gettimeofday() system calls. ouch! there is also another more subtle problem. the cluster timestamp in the XML was added to handle clusters scattered over different timezones. your method is right on because it will ensure that the time line for metrics matches the real time on the remote clusters (and not the time of the machine collecting the cluster data). also... your method ensures the data values match exactly the time they are valid... the old method is VERy sensitive to the time of parsing the XML and saving to the databases. we need to wrap that change up into a release and get that out to the users soon. actually, federico wrapped up a 2.5.2 release and i dropped the ball on getting it the rest of the way out the door. i have been swamped with meeting, conferences and working on ganglia 3 lately... i think it might be smart if i focus back on getting ganglia 2.5.x cleaned up and ready for a good maintenance release. do you have a patch for the changes against the latest CVS source? how has your fix been working for you lately? -- matt Wednesday, Steven Wagner wrote forth saying... > So lately, as the size of one of my clusters' RRD directory baloons past > the third-of-a-gigabyte mark, I've been noticing a dramatic increase in > data gaps in some of the graps. > > I decided to put my money ... er, *development time* where my mouth ... er, > *whiny developer ranting* is and modified gmetad to try to use use the > CLUSTER_REPORTED value in RRD_update() or its own version of NOW(), instead > of leaving that up to RRD. > > And, well, boy, it sure does seem to be working. The gaps had gotten to > the point where they represented well over 2/3 of the graph. Now I'm back > to a steady stream of data. > > I'll run this for a few days and make sure that nothing explodes in a > mountain of fiery goo before I send in a patch. > > <rehash alert> > > I don't really see a down-side to this, because after all, the XML snapshot > is from a specific time (represented pretty accurately by > CLUSTER_REPORTED), so why bother looking up the clock (NUMBER_OF_METRICS * > NUMBER_OF_HOSTS+1) times every time the data source thread runs? Plus RRD > pukes if, for some reason, two updates happen to hit the same RRD in the > space of one second. > > And if RRD returns an error code, the whole thing aborts... making gaps! > > Anyway. > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Ganglia-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-developers >
