I'm writing the RRDs to a local SCSI drive with an ext3 filesystem. I'll investigate the RAM disk option. Thanks!
Steve Gilbert Unix Systems Administrator [EMAIL PROTECTED] -----Original Message----- From: matt massie [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 5:02 PM To: Steve Gilbert Cc: '[email protected]' Subject: Re: [Ganglia-general] Ganglia architecture and gmond load steve- the single biggest problem with scaling gmetad is disk i/o problems. what type of filesystem are you writing the gmetad RRDs to? most people have had very good luck using a Ram-based filesystem and then periodically syncing the data to disk. for example in linux, % mount -t tmpfs tmpfs /mnt now the /mnt directory is a ram-backed filesystem. if the machine is rebooted however all the data is lost. so you will need to write the contents of that filesystem to disk every now and then. -matt Today, Steve Gilbert wrote forth saying... > From: Steve Gilbert <[EMAIL PROTECTED]> > To: "'[email protected]'" > <[email protected]> > Date: Wed, 17 Sep 2003 16:45:26 -0700 > Subject: [Ganglia-general] Ganglia architecture and gmond load > > Hi folks, > > I don't know if I'm just trying to push Ganglia to more than it can handle > or if I'm doing something wrong, but no matter how I design my Ganglia > structure, gmetad seems to always crush the machine where it runs. Here's > an overview of my environment: > > Ganglia 2.5.4 > All hosts involved are running RedHat 7.2 > RRDtool version 1.0.45 > > I have 16 subnets, each with 200 machines give or take a few. I estimate > around 3000 nodes total. Some of these are dual P3, some are single P4, and > a few random Xeon and Itanium nodes. Every node is running gmond, and > that's running fine. > > Each subnet has a "master" node that is a dual P3 1.3GHz. This box provides > DNS, NIS, and static DHCP for the subnet. Normal load on these machines is > very, very minimal. > > My first attempt was to set up a single dedicated Ganglia machine running > gmetad, Apache, and the web frontend. In this machine's gmetad.conf file, I > listed each of the "master" nodes in the subnets as data sources. I thought > having one box collect all the data and store the RRD files would be great. > Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely > crushed...load shot up to 8.5, and all the graphs continually had gaps in > them. > > So my next attempt to was to install gmetad on each of the "master" nodes. > I would have this gmetad collect data for the subnet, and then run another > gmetad on my Ganglia web machine to just talk to these 16 other gmetads. I > don't really like having to now backup 16 machines, but I've had problems > before with trying to store RRD files on an NFS mount, so I decided not to > try that. This isn't working all that great, either...the gmetad on these > "master" nodes (collecting data from ~200 hosts each) is also causing a > pretty high load...the boxes now stay around 2-3 load points all the time > and sometimes slows down other operations on the box. > > Am I doing something wrong, or is gmetad really this much of a resource hog? > Anyone else trying to use Ganglia to monitor 3000 machines? Am I asking too > much? Thanks for any insight. > > Steve Gilbert > Unix Systems Administrator > [EMAIL PROTECTED] > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general >

