Some sort of RAM disk is probably the only thing that will be able to
handle the rrd I/O load and allow gmetad to monitor more than a few
hundred nodes.  If your gmetad node is running Linux then I would
suggest using tmpfs which basically implements a POSIX filesystem in the
kernel's VFS.

All you have to do is decide how much space ganglia requires and add a
line like the following to /etc/fstab and mount it:

none  /var/lib/ganglia/rrds  tmpfs  \
    size=1024M,mode=755,uid=nobody,gid=nobody   0 0

The documentation for tmpfs is located here:

/usr/src/linux/Documentation/filesystems/tmpfs.txt

I am using this with ganglia to monitor over 1300 nodes, split into 10
clusters, with a single gmetad and the load is fairly light.  It is only
using about 435MB of RAM to store all of the rrd files, or about
340kB/node.  Have you added extra gmetric data to reach 150MB for 275
nodes (559kb each)?

To save the data in case of a system crash, I just patched the gmetad
init script to backup and restore the rrds with tar when it is stopped
and started, then use a daily cronjob to restart gmetad every night.  I
stop gmetad before backing up otherwise tar complains that one or more
files has changed while it was being read.  I have attached the init
patch that I use in case you are interested.

~Jason


On Thu, 2005-03-10 at 08:41, Ramon Bastiaans wrote:
> Dan Moniz wrote:
> 
> > <snip>
> 
> > 3) Load on the monitor host/head node seems higher than it should be. 
> > It hovers around 2.6 - 3.0. While other software is running on this 
> > host, shutting down gmetad results in load falling back down to levels 
> > similar to other compute hosts (since the monitor host/head node is 
> > currently also a host in the Compute Hosts cluster). Also, in concert 
> > with the higher than expected load, ssh sessions to the monitor 
> > host/head node seem to take a long time to establish. Again, shutting 
> > down gmetad seems to alleviate these problems. While both of these 
> > issues don't prevent work from being done or gmetad from working (in 
> > the current configuration), it does seem abnormally high and is 
> > something of an annoyance.
> >
> Does this happen all the time, or do you happen to have a webbrowser 
> open all the time on the ganglia page? If so, I might know why.
> 
> Over here we noticed that when one or especially multiple people have a 
> webbrowser open continuously, it generates a bigger load on the web 
> frontend server. This seemed to happen because the cluster overview page 
> shows all host graph's by default, and it refreshes automaticly.
> 
> Meaning that everytime the overview automaticly refreshes, it redraws 
> 280 host graphs, which can be quite consuming depending on hardware.
> 
> If this seems to be your the case, I have a little patch to set the 
> default cluster overview to not show the host graphs by default. This 
> decreased the load on our web frontend server. It still stays around 0.9 
> over here, but that's better than 2.5+
> 
> I would also recommend running the gmetad / web frontend on a seperate 
> machine and not on your head/login node if you can spare the hardware.
> 
> You could also use a ramdisk as Matt suggested to store the .rrd's, if 
> you have enough RAM in the machine. However our cluster (275 machines) 
> generates about 150 Mb's worth of .rrd files, which is a pretty big 
> chunk of RAM.
> 
> 
> Ramon.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/

diff -uNr ganglia-monitor-core-2.5.3-dist/gmetad/gmetad.init ganglia-monitor-core-2.5.3/gmetad/gmetad.init
--- ganglia-monitor-core-2.5.3-dist/gmetad/gmetad.init	Fri Oct 18 17:57:58 2002
+++ ganglia-monitor-core-2.5.3/gmetad/gmetad.init	Thu Aug 28 14:09:27 2003
@@ -10,8 +10,21 @@
 
 RETVAL=0
 
+# For improved performance, make a tmpfs filesystem to store the rrds into memory:
+#  - Add a line like this to your /etc/fstab file:
+# $ echo -e "none\t\t\t/var/lib/ganglia/rrds\ttmpfs\tsize=500M,mode=755,uid=nobody,gid=nobody\t0 0" >>/etc/fstab
+# Comment this out to disable the tmpfs database backup/restore:
+TMPFS=1
+
 case "$1" in
    start)
+      # Restore the data backup if necessary:
+      if [ "$TMPFS" -a ! -d /var/lib/ganglia/rrds/__SummaryInfo__ -a -r /var/lib/ganglia/rrds-backup.tar ]; then
+	echo -n "Restoring gmetad's rrds database from saved backup...."
+	tar -xPf /var/lib/ganglia/rrds-backup.tar
+	echo "done."
+      fi
+
       echo -n "Starting GANGLIA gmetad: "
       [ -f $GMETAD ] || exit 1
 
@@ -27,6 +40,13 @@
       RETVAL=$?
       echo
       [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/gmetad
+
+      # Make a backup of gmetad's database directory:
+      if [ "$TMPFS" -a "$RETVAL" -eq "0" -a -d /var/lib/ganglia/rrds/__SummaryInfo__ ]; then
+	echo -n "Saving gmetad's rrds database directory to disk...."
+	tar -cPf /var/lib/ganglia/rrds-backup.tar /var/lib/ganglia/rrds
+	echo "done."
+      fi
       ;;
 
   restart|reload)

Reply via email to