steve-

your idea is right on.  i'm actually embarrassed that i didn't think of 
that before.  

man... if you only have 128 node cluster w/30 standard metrics.. that
means that every 15 seconds you are making 3840 gettimeofday() system 
calls.  ouch!  

there is also another more subtle problem.  the cluster timestamp in the 
XML was added to handle clusters scattered over different timezones.  your 
method is right on because it will ensure that the time line for metrics 
matches the real time on the remote clusters (and not the time of the 
machine collecting the cluster data).

also... your method ensures the data values match exactly the time they 
are valid... the old method is VERy sensitive to the time of parsing the 
XML and saving to the databases.

we need to wrap that change up into a release and get that out to the 
users soon.  actually, federico wrapped up a 2.5.2 release and i dropped 
the ball on getting it the rest of the way out the door.  i have been 
swamped with meeting, conferences and working on ganglia 3 lately... i 
think it might be smart if i focus back on getting ganglia 2.5.x cleaned 
up and ready for a good maintenance release.  

do you have a patch for the changes against the latest CVS source?  how 
has your fix been working for you lately?
-- 
matt

Wednesday, Steven Wagner wrote forth saying...

> So lately, as the size of one of my clusters' RRD directory baloons past 
> the third-of-a-gigabyte mark, I've been noticing a dramatic increase in 
> data gaps in some of the graps.
> 
> I decided to put my money ... er, *development time* where my mouth ... er, 
> *whiny developer ranting* is and modified gmetad to try to use use the 
> CLUSTER_REPORTED value in RRD_update() or its own version of NOW(), instead 
> of leaving that up to RRD.
> 
> And, well, boy, it sure does seem to be working.  The gaps had gotten to 
> the point where they represented well over 2/3 of the graph.  Now I'm back 
> to a steady stream of data.
> 
> I'll run this for a few days and make sure that nothing explodes in a 
> mountain of fiery goo before I send in a patch.
> 
> <rehash alert>
> 
> I don't really see a down-side to this, because after all, the XML snapshot 
> is from a specific time (represented pretty accurately by 
> CLUSTER_REPORTED), so why bother looking up the clock (NUMBER_OF_METRICS * 
> NUMBER_OF_HOSTS+1) times every time the data source thread runs?  Plus RRD 
> pukes if, for some reason, two updates happen to hit the same RRD in the 
> space of one second.
> 
> And if RRD returns an error code, the whole thing aborts... making gaps!
> 
> Anyway.
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Scholarships for Techies!
> Can't afford IT training? All 2003 ictp students receive scholarships.
> Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more.
> www.ictp.com/training/sourceforge.asp
> _______________________________________________
> Ganglia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
> 



Reply via email to