So lately, as the size of one of my clusters' RRD directory baloons past the third-of-a-gigabyte mark, I've been noticing a dramatic increase in data gaps in some of the graps.

I decided to put my money ... er, *development time* where my mouth ... er, *whiny developer ranting* is and modified gmetad to try to use use the CLUSTER_REPORTED value in RRD_update() or its own version of NOW(), instead of leaving that up to RRD.

And, well, boy, it sure does seem to be working. The gaps had gotten to the point where they represented well over 2/3 of the graph. Now I'm back to a steady stream of data.

I'll run this for a few days and make sure that nothing explodes in a mountain of fiery goo before I send in a patch.

<rehash alert>

I don't really see a down-side to this, because after all, the XML snapshot is from a specific time (represented pretty accurately by CLUSTER_REPORTED), so why bother looking up the clock (NUMBER_OF_METRICS * NUMBER_OF_HOSTS+1) times every time the data source thread runs? Plus RRD pukes if, for some reason, two updates happen to hit the same RRD in the space of one second.

And if RRD returns an error code, the whole thing aborts... making gaps!

Anyway.


Reply via email to