This has bitten me off and on for a couple of years.
See the example of chunky Load graph for a summary of a Cluster of Servers.
http://www.rreeder.net/graph.load.php.gif
What was going on?
There is not enough 'time' in the first level of the RRD database - so It drops
down to the second level to get the requested
one hour duration.
The gmetad daemon was building the RRD database with 240 intervals of _13_
seconds, _not_ 15 seconds.
I don't know why gmetad was doing this ... I know it was from: rrdtool info
<hostname-metric.rrd>
Since the number 240 for intervals is _hard coded_ in the gmetad server, I
don't know why you don't go
ahead and hard code the length of interval to 15 seconds... I have done this
at my site, and built
a patch and a new rpm based on that patch.
I have also written a simple script to update the RRD databases to fix the
interval problem .... Just add a few intervals
so we have enough for a full hour.
#
# cd into the the rrdtool database directory.
cd /var/lib/ganglia/rrd/
# Or if just certain cluster look like this, then cd "Cluster Name"
find . -exec bash /local1/test2.sh {} \;
#where the sciprt test2.sh contains:
#!/bin/bash
for i
do echo $i; sudo -u nobody rrdtool resize "$i" 0 GROW 38;if [ $? ]; then echo "move $i"; sudo -u nobody cp -p "$i" "$i.old" ;sudo
-u nobody mv -f resize.rrd "$i" ; sudo chown nobody:root * ; sudo -u nobody chmod oug+rw *; fi ; done
# node last two lines is all one line. (Can tell I'm a _real_ programmer... Or
just a confused one.)
This assumes you have sudo configured and installed for your Id, if not ...
Simply replace the
sudo -u nobody
with
su - nobody
in the above.
After happy happy your charts look great you might:
rm -R *.old