This has bitten me off and on for a couple of years.

See the example of chunky Load graph for a summary of a Cluster of Servers.

http://www.rreeder.net/graph.load.php.gif

What was going on?

There is not enough 'time' in the first level of the RRD database - so It drops 
down to the second level to get the requested
one hour duration.

The gmetad daemon was building the RRD database with 240 intervals of _13_ 
seconds, _not_ 15 seconds.
I don't know why gmetad was doing this ... I know it was from: rrdtool info 
<hostname-metric.rrd>

Since the number 240 for intervals is _hard coded_ in the gmetad server, I 
don't know why you don't go
ahead and hard code the length of interval to 15 seconds...  I have done this 
at my site, and built
a patch and a new rpm based on that patch.

I have also written a simple script to update the RRD databases to fix the 
interval problem .... Just add  a few intervals
so we have enough for a full hour.

#
# cd into the the rrdtool database directory.
cd /var/lib/ganglia/rrd/
# Or if just certain cluster look like this, then cd "Cluster Name"
find . -exec bash /local1/test2.sh {} \;

#where the sciprt test2.sh contains:
#!/bin/bash
for i
do echo $i; sudo -u nobody rrdtool resize "$i" 0 GROW 38;if [ $? ]; then echo "move $i"; sudo -u nobody cp -p "$i" "$i.old" ;sudo -u nobody mv -f resize.rrd "$i" ; sudo chown nobody:root * ; sudo -u nobody chmod oug+rw *; fi ; done

# node last two lines is all one line.  (Can tell I'm a _real_ programmer... Or 
just a confused one.)
This assumes you have sudo configured and installed for your Id, if not ...
Simply replace the
sudo -u nobody
with
su - nobody
in the above.
After happy happy your charts look great you might:

rm -R *.old



Reply via email to