Hi folks,

I don't know if I'm just trying to push Ganglia to more than it can handle
or if I'm doing something wrong, but no matter how I design my Ganglia
structure, gmetad seems to always crush the machine where it runs.  Here's
an overview of my environment:

Ganglia 2.5.4
All hosts involved are running RedHat 7.2
RRDtool version 1.0.45

I have 16 subnets, each with 200 machines give or take a few.  I estimate
around 3000 nodes total.  Some of these are dual P3, some are single P4, and
a few random Xeon and Itanium nodes.  Every node is running gmond, and
that's running fine.

Each subnet has a "master" node that is a dual P3 1.3GHz.  This box provides
DNS, NIS, and static DHCP for the subnet.  Normal load on these machines is
very, very minimal.

My first attempt was to set up a single dedicated Ganglia machine running
gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file, I
listed each of the "master" nodes in the subnets as data sources.  I thought
having one box collect all the data and store the RRD files would be great.
Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
crushed...load shot up to 8.5, and all the graphs continually had gaps in
them.

So my next attempt to was to install gmetad on each of the "master" nodes.
I would have this gmetad collect data for the subnet, and then run another
gmetad on my Ganglia web machine to just talk to these 16 other gmetads.  I
don't really like having to now backup 16 machines, but I've had problems
before with trying to store RRD files on an NFS mount, so I decided not to
try that.  This isn't working all that great, either...the gmetad on these
"master" nodes (collecting data from ~200 hosts each) is also causing a
pretty high load...the boxes now stay around 2-3 load points all the time
and sometimes slows down other operations on the box.

Am I doing something wrong, or is gmetad really this much of a resource hog?
Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking too
much?  Thanks for any insight.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]

Reply via email to