Since we have been slowly increasing the number of clusters and hosts that we are monitoring with ganglia, I have been watching closely how the gmetad host is handling the increased load and experimenting with a few alternatives for locations to store the rrds.
At first it was just on a partition made up of a pair of raided disks which obviously didn't scale very far. Then I tried moving to a filesystem image mounted via the loopback device. I think this helped to aggregate the thousands of disk accesses into just updating a single file, but as the size grew to a few hundred megabytes total (several hundred nodes), the disk I/O started to cause too much of a load on the gmetad node. One advantage of this method is it is fairly easy to setup and the rrds are still being written to a physical disk so you don't lose any data on a reboot. Then I tried experimenting with ramfs, since it is much easier to setup than a ramdisk, but I had a few problems with it, that I suspect might be bugs in the Linux kernel. Since now the databases are stored only in RAM, I had a cronjob that would run every hour to backup the rrds directory. Occasionally a process (either gmetad or tar) would go into an uninterruptable sleep state which would lock up the ramfs partition. I would be forced to reboot in order to continue collecting data. Also, it appeared that there was an almost 50% overhead in using ramfs (my 225MB rrds directory would consume about 337MB of RAM. Then I finally settled on a ramdisk, although it takes a little more effort to setup and use. The performance appears to be about the same as ramfs, without the lockups and 50% overhead, which really helps to alleviate the disk I/O load on the gmetad node. So, how do other people handle their large database directories? Is everyone using a ramdisk or has anyone used ramfs successfully? How different will things be in the upcoming ganglia3 release? Will the rrds be basically the same as they are now or will there be major changes in that part of ganglia also? ~Jason -- /------------------------------------------------------------------\ | Jason A. Smith Email: [EMAIL PROTECTED] | | Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 | | Upton, NY 11973-5000 | \------------------------------------------------------------------/

