So lately, as the size of one of my clusters' RRD directory baloons past
the third-of-a-gigabyte mark, I've been noticing a dramatic increase in
data gaps in some of the graps.
I decided to put my money ... er, *development time* where my mouth ... er,
*whiny developer ranting* is and modified gmetad to try to use use the
CLUSTER_REPORTED value in RRD_update() or its own version of NOW(), instead
of leaving that up to RRD.
And, well, boy, it sure does seem to be working. The gaps had gotten to
the point where they represented well over 2/3 of the graph. Now I'm back
to a steady stream of data.
I'll run this for a few days and make sure that nothing explodes in a
mountain of fiery goo before I send in a patch.
<rehash alert>
I don't really see a down-side to this, because after all, the XML snapshot
is from a specific time (represented pretty accurately by
CLUSTER_REPORTED), so why bother looking up the clock (NUMBER_OF_METRICS *
NUMBER_OF_HOSTS+1) times every time the data source thread runs? Plus RRD
pukes if, for some reason, two updates happen to hit the same RRD in the
space of one second.
And if RRD returns an error code, the whole thing aborts... making gaps!
Anyway.