Some sort of RAM disk is probably the only thing that will be able to
handle the rrd I/O load and allow gmetad to monitor more than a few
hundred nodes. If your gmetad node is running Linux then I would
suggest using tmpfs which basically implements a POSIX filesystem in the
kernel's VFS.
All you have to do is decide how much space ganglia requires and add a
line like the following to /etc/fstab and mount it:
none /var/lib/ganglia/rrds tmpfs \
size=1024M,mode=755,uid=nobody,gid=nobody 0 0
The documentation for tmpfs is located here:
/usr/src/linux/Documentation/filesystems/tmpfs.txt
I am using this with ganglia to monitor over 1300 nodes, split into 10
clusters, with a single gmetad and the load is fairly light. It is only
using about 435MB of RAM to store all of the rrd files, or about
340kB/node. Have you added extra gmetric data to reach 150MB for 275
nodes (559kb each)?
To save the data in case of a system crash, I just patched the gmetad
init script to backup and restore the rrds with tar when it is stopped
and started, then use a daily cronjob to restart gmetad every night. I
stop gmetad before backing up otherwise tar complains that one or more
files has changed while it was being read. I have attached the init
patch that I use in case you are interested.
~Jason
On Thu, 2005-03-10 at 08:41, Ramon Bastiaans wrote:
> Dan Moniz wrote:
>
> > <snip>
>
> > 3) Load on the monitor host/head node seems higher than it should be.
> > It hovers around 2.6 - 3.0. While other software is running on this
> > host, shutting down gmetad results in load falling back down to levels
> > similar to other compute hosts (since the monitor host/head node is
> > currently also a host in the Compute Hosts cluster). Also, in concert
> > with the higher than expected load, ssh sessions to the monitor
> > host/head node seem to take a long time to establish. Again, shutting
> > down gmetad seems to alleviate these problems. While both of these
> > issues don't prevent work from being done or gmetad from working (in
> > the current configuration), it does seem abnormally high and is
> > something of an annoyance.
> >
> Does this happen all the time, or do you happen to have a webbrowser
> open all the time on the ganglia page? If so, I might know why.
>
> Over here we noticed that when one or especially multiple people have a
> webbrowser open continuously, it generates a bigger load on the web
> frontend server. This seemed to happen because the cluster overview page
> shows all host graph's by default, and it refreshes automaticly.
>
> Meaning that everytime the overview automaticly refreshes, it redraws
> 280 host graphs, which can be quite consuming depending on hardware.
>
> If this seems to be your the case, I have a little patch to set the
> default cluster overview to not show the host graphs by default. This
> decreased the load on our web frontend server. It still stays around 0.9
> over here, but that's better than 2.5+
>
> I would also recommend running the gmetad / web frontend on a seperate
> machine and not on your head/login node if you can spare the hardware.
>
> You could also use a ramdisk as Matt suggested to store the .rrd's, if
> you have enough RAM in the machine. However our cluster (275 machines)
> generates about 150 Mb's worth of .rrd files, which is a pretty big
> chunk of RAM.
>
>
> Ramon.
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
/------------------------------------------------------------------\
| Jason A. Smith Email: [EMAIL PROTECTED] |
| Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 |
| Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 |
| Upton, NY 11973-5000 |
\------------------------------------------------------------------/
diff -uNr ganglia-monitor-core-2.5.3-dist/gmetad/gmetad.init ganglia-monitor-core-2.5.3/gmetad/gmetad.init
--- ganglia-monitor-core-2.5.3-dist/gmetad/gmetad.init Fri Oct 18 17:57:58 2002
+++ ganglia-monitor-core-2.5.3/gmetad/gmetad.init Thu Aug 28 14:09:27 2003
@@ -10,8 +10,21 @@
RETVAL=0
+# For improved performance, make a tmpfs filesystem to store the rrds into memory:
+# - Add a line like this to your /etc/fstab file:
+# $ echo -e "none\t\t\t/var/lib/ganglia/rrds\ttmpfs\tsize=500M,mode=755,uid=nobody,gid=nobody\t0 0" >>/etc/fstab
+# Comment this out to disable the tmpfs database backup/restore:
+TMPFS=1
+
case "$1" in
start)
+ # Restore the data backup if necessary:
+ if [ "$TMPFS" -a ! -d /var/lib/ganglia/rrds/__SummaryInfo__ -a -r /var/lib/ganglia/rrds-backup.tar ]; then
+ echo -n "Restoring gmetad's rrds database from saved backup...."
+ tar -xPf /var/lib/ganglia/rrds-backup.tar
+ echo "done."
+ fi
+
echo -n "Starting GANGLIA gmetad: "
[ -f $GMETAD ] || exit 1
@@ -27,6 +40,13 @@
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/gmetad
+
+ # Make a backup of gmetad's database directory:
+ if [ "$TMPFS" -a "$RETVAL" -eq "0" -a -d /var/lib/ganglia/rrds/__SummaryInfo__ ]; then
+ echo -n "Saving gmetad's rrds database directory to disk...."
+ tar -cPf /var/lib/ganglia/rrds-backup.tar /var/lib/ganglia/rrds
+ echo "done."
+ fi
;;
restart|reload)