On Fri, Mar 11, 2005 at 11:18:26AM +0100, Ramon Bastiaans wrote: > Jason A. Smith wrote: > > >Some sort of RAM disk is probably the only thing that will be able to > >handle the rrd I/O load and allow gmetad to monitor more than a few > >hundred nodes. If your gmetad node is running Linux then I would > >suggest using tmpfs which basically implements a POSIX filesystem in the > >kernel's VFS. > > > >All you have to do is decide how much space ganglia requires and add a > >line like the following to /etc/fstab and mount it: > > > >none /var/lib/ganglia/rrds tmpfs \ > > size=1024M,mode=755,uid=nobody,gid=nobody 0 0 > > > >The documentation for tmpfs is located here: > > > >/usr/src/linux/Documentation/filesystems/tmpfs.txt > > > >I am using this with ganglia to monitor over 1300 nodes, split into 10 > >clusters, with a single gmetad and the load is fairly light. It is only > >using about 435MB of RAM to store all of the rrd files, or about > >340kB/node. Have you added extra gmetric data to reach 150MB for 275 > >nodes (559kb each)? > > > > > > > Yes, we also distribute some metric's on our infiniband network cards > and such. And it appears to be more around 135 Mb after making a more > precise calculation ;) (~13 Kb per metric, 43 metrics per host, 275 > hosts in the cluster) > > >To save the data in case of a system crash, I just patched the gmetad > >init script to backup and restore the rrds with tar when it is stopped > >and started, then use a daily cronjob to restart gmetad every night. I > >stop gmetad before backing up otherwise tar complains that one or more > >files has changed while it was being read. I have attached the init > >patch that I use in case you are interested. > > > >~Jason > > > > > So if there is a kernel panic or similar (lets hope not) you only have > the data from your last backup, which was on startup or at midnight, > right? Aren't you loosing big timeframes in case a crash occures? I have > been thinking about a ramdisk too, but are somewhat reluctant on the > data loss in case of a crash. How to maintain a recent (backup) copy to > disk, so that as little data as possible is lost on a crash? > > I am also interested because for a project of mine I have to archive the > rrd data for the historical tracking of compute jobs and their > performance. However, copy'ing 135 Mb worth of rrd files every hour - in > less than 15 seconds (metric interval) - is hardly doable. Using a > ramdisk would certainly speed up things. I am tempted though to write an > other tool who archives the data directly from the multicast channel, > but this would mean even more disk access (and load).
I've been having this problem on our FreeBSD cluster. One idea that occures to me is that (at least on FreeBSD), use file system snapshots to make the temporary copy which should be a lot faster then actually copying the data and then doing the copy from the snapshot. It won't solve the disk load problem, but it would reduce the amount of time without gmetad running (assuming you need to worry about consistancy which I'm not actually sure is the case). If you wanted to avoid killing gmetad, it would be really cool to add some signal handlers to stop and start disk writes. That's probably non-trivial, but perhaps it's something that wouldn't be too hard when Matt does the upcoming overhaul. Another option all together is to use a hardware ram disk. I've been thinking on an off about using something like the Cenatek RocketDrive which is a PCI device that presents SDRAM as a disk. It supports an external powersupply so you shouldn't have to backup very often. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4
pgpvKaivxZ9fE.pgp
Description: PGP signature

