Cassandra Pugh <[email protected]> wrote: > I am monitoring *1630 nodes. * > So, I thought that perhaps it was getting overburdened at times from all > the traffic. However I am using a very beefy machine, and do not see > high cpu or memory usage even during these gaps. Also the network folks > don't see any abnormally large traffic on the network.
gmetad is not very cpu-intensive or memory-intensive in my experience, but if you're storing all those RRD files on disk, you could easily be overwhelming the disk. Did you look for I/O wait? I had to migrate my Ganglia installations to store their RRDs on tmpfs (RAM) at around 30 nodes, though I had a *lot* of custom metrics and each one gets stored per node. I also hear rumors than in the past year Ganglia has moved to a newer version of RRDtool that caches writes, so it should be able to handle staying on disk better than it did when I was running it. Still, with 1630 nodes, especially if you have some custom metrics, you could easily be overwhelming your disk. If you see a lot of I/O wait, you could move to tmpfs like I did. I had a cron job to rsync to real disk every 10 minutes, and I changed the init script to rsync when you start and stop Ganglia. You can search this list's archives for my post which includes my diffs to the init script, and all the steps for what I did. -- Cos ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

