Re: [Ganglia-general] Ganglia issues I've been experiencing

Ramon Bastiaans Fri, 11 Mar 2005 02:18:39 -0800

Jason A. Smith wrote:

Some sort of RAM disk is probably the only thing that will be able to
handle the rrd I/O load and allow gmetad to monitor more than a few
hundred nodes.  If your gmetad node is running Linux then I would
suggest using tmpfs which basically implements a POSIX filesystem in the
kernel's VFS.


All you have to do is decide how much space ganglia requires and add a
line like the following to /etc/fstab and mount it:

none  /var/lib/ganglia/rrds  tmpfs  \
   size=1024M,mode=755,uid=nobody,gid=nobody   0 0

The documentation for tmpfs is located here:

/usr/src/linux/Documentation/filesystems/tmpfs.txt

I am using this with ganglia to monitor over 1300 nodes, split into 10
clusters, with a single gmetad and the load is fairly light.  It is only
using about 435MB of RAM to store all of the rrd files, or about
340kB/node.  Have you added extra gmetric data to reach 150MB for 275
nodes (559kb each)?

Yes, we also distribute some metric's on our infiniband network cardsand such. And it appears to be more around 135 Mb after making a moreprecise calculation ;) (~13 Kb per metric, 43 metrics per host, 275hosts in the cluster)

To save the data in case of a system crash, I just patched the gmetad
init script to backup and restore the rrds with tar when it is stopped
and started, then use a daily cronjob to restart gmetad every night.  I
stop gmetad before backing up otherwise tar complains that one or more
files has changed while it was being read.  I have attached the init
patch that I use in case you are interested.

~Jason

So if there is a kernel panic or similar (lets hope not) you only havethe data from your last backup, which was on startup or at midnight,right? Aren't you loosing big timeframes in case a crash occures? I havebeen thinking about a ramdisk too, but are somewhat reluctant on thedata loss in case of a crash. How to maintain a recent (backup) copy todisk, so that as little data as possible is lost on a crash?

I am also interested because for a project of mine I have to archive therrd data for the historical tracking of compute jobs and theirperformance. However, copy'ing 135 Mb worth of rrd files every hour - inless than 15 seconds (metric interval) - is hardly doable. Using aramdisk would certainly speed up things. I am tempted though to write another tool who archives the data directly from the multicast channel,but this would mean even more disk access (and load).


- Ramon.

Re: [Ganglia-general] Ganglia issues I've been experiencing

Reply via email to