I recently noticed that sometimes my 'month' graphs were being plotted 
very blockily - with very few data points - at certain times.  After a bit 
of diagnostic work using rrdtool fetch, I discovered that for a period of 
maybe thirty minutes each day, it would return only 30ish data points, 
instead of 240.  There then followed a mind-bending discussion on the 
rrdtool developers list which you're welcome to dig out!

It turns out that, by design, rrdtool will favour 'coverage over 
resolution' - so if your 'year' rra provides better coverage for the 
requested interval than the 'month' rra, the year one will be used, even 
though the month one is much higher resolution.

The (default) RRD definition used by ganglia is
  RRA:AVERAGE:0.5:1:240
  RRA:AVERAGE:0.5:24:240
  RRA:AVERAGE:0.5:168:240
  RRA:AVERAGE:0.5:672:240
  RRA:AVERAGE:0.5:5760:370
with a step of 15, and the problem comes because 5760 is not a multiple 
of 672.  rrdtool populates the rra in chunks, and will commit a new value
to, say, the 'month' rra, when (time % 672x15)==0.

However, sometimes,
  (time % 672x15) > (time % 5760x15)
meaning there is more 'uncommitted' time in the 'month' rra than in the 
'year' rra, so the 'year' rra is more up to date.

And the catch comes because graph.php always requests time periods ending 
NOW.  So, sometimes the 'year' rra gets closer to now than the 'month' 
rra, and so it uses that instead and you get an ugly graph.  Hurrah!

There are two approaches to fixing this.  The first is to change the RRD 
definition, so each RRA uses a time period which is a multiple of the 
previous RRAs.  This is a pain for those of us with many thousands of 
RRDs!

My suggested approach is to amend graph.php so that, rather than 'now' as 
the endpoint, it uses the most recent exact multiple of the sample time 
period, such as 10080 (672x15) in the 'month' case.  This is guaranteed 
to be covered by the highest resolution rra, so that will therefore be 
used.  You only need to apply this correction for 'month' plots given the 
current RRD definition, but I've done all of them - it stop you getting 
that blank final column :)

(My graph.php is locally modified, so here's example code rather than a 
patch)

==============================================================
switch ($start) {
case -3600:     $round = 15; break;
case -86400:    $round = 360; break;
case -604800:   $round = 2520; break;
case -2419200:  $round = 10080; break;
case -31449600: $round = 86400; break;
default:        $round = 0;
}
if ($round>0) { $end = floor(time() / $round) * $round; }
         else { $end = "N"; }

$command = RRDTOOL . " graph - --start $start --end $end ".
[etc. etc.]
==============================================================

WARNING: our 'month' rra is only 28 days long; it was tempting to apply 
the same backwards correction to the start time as the end time, but if 
you do this, you'll fall off the _beginning_ of the rra, and once again 
rrdtool will give you the year plot, instead of the month one!

Phil


Reply via email to