Since we have been slowly increasing the number of clusters and hosts
that we are monitoring with ganglia, I have been watching closely how
the gmetad host is handling the increased load and experimenting with a
few alternatives for locations to store the rrds.

At first it was just on a partition made up of a pair of raided disks
which obviously didn't scale very far.  Then I tried moving to a
filesystem image mounted via the loopback device.  I think this helped
to aggregate the thousands of disk accesses into just updating a single
file, but as the size grew to a few hundred megabytes total (several
hundred nodes), the disk I/O started to cause too much of a load on the
gmetad node.  One advantage of this method is it is fairly easy to setup
and the rrds are still being written to a physical disk so you don't
lose any data on a reboot.

Then I tried experimenting with ramfs, since it is much easier to setup
than a ramdisk, but I had a few problems with it, that I suspect might
be bugs in the Linux kernel.  Since now the databases are stored only in
RAM, I had a cronjob that would run every hour to backup the rrds
directory.  Occasionally a process (either gmetad or tar) would go into
an uninterruptable sleep state which would lock up the ramfs partition. 
I would be forced to reboot in order to continue collecting data.  Also,
it appeared that there was an almost 50% overhead in using ramfs (my
225MB rrds directory would consume about 337MB of RAM.

Then I finally settled on a ramdisk, although it takes a little more
effort to setup and use.  The performance appears to be about the same
as ramfs, without the lockups and 50% overhead, which really helps to
alleviate the disk I/O load on the gmetad node.

So, how do other people handle their large database directories?  Is
everyone using a ramdisk or has anyone used ramfs successfully?  How
different will things be in the upcoming ganglia3 release?  Will the
rrds be basically the same as they are now or will there be major
changes in that part of ganglia also?

~Jason


-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/


Reply via email to