I'm writing the RRDs to a local SCSI drive with an ext3 filesystem.  I'll
investigate the RAM disk option.  Thanks!

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]


-----Original Message-----
From: matt massie [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 5:02 PM
To: Steve Gilbert
Cc: '[email protected]'
Subject: Re: [Ganglia-general] Ganglia architecture and gmond load


steve-

the single biggest problem with scaling gmetad is disk i/o problems.  what 
type of filesystem are you writing the gmetad RRDs to?  most people have 
had very good luck using a Ram-based filesystem and then periodically 
syncing the data to disk.  

for example in linux,

% mount -t tmpfs tmpfs /mnt

now the /mnt directory is a ram-backed filesystem.  if the machine is 
rebooted however all the data is lost.  so you will need to write the 
contents of that filesystem to disk every now and then.

-matt

Today, Steve Gilbert wrote forth saying...

> From: Steve Gilbert <[EMAIL PROTECTED]>
> To: "'[email protected]'"
>     <[email protected]>
> Date: Wed, 17 Sep 2003 16:45:26 -0700
> Subject: [Ganglia-general] Ganglia architecture and gmond load
> 
> Hi folks,
> 
> I don't know if I'm just trying to push Ganglia to more than it can handle
> or if I'm doing something wrong, but no matter how I design my Ganglia
> structure, gmetad seems to always crush the machine where it runs.  Here's
> an overview of my environment:
> 
> Ganglia 2.5.4
> All hosts involved are running RedHat 7.2
> RRDtool version 1.0.45
> 
> I have 16 subnets, each with 200 machines give or take a few.  I estimate
> around 3000 nodes total.  Some of these are dual P3, some are single P4,
and
> a few random Xeon and Itanium nodes.  Every node is running gmond, and
> that's running fine.
> 
> Each subnet has a "master" node that is a dual P3 1.3GHz.  This box
provides
> DNS, NIS, and static DHCP for the subnet.  Normal load on these machines
is
> very, very minimal.
> 
> My first attempt was to set up a single dedicated Ganglia machine running
> gmetad, Apache, and the web frontend.  In this machine's gmetad.conf file,
I
> listed each of the "master" nodes in the subnets as data sources.  I
thought
> having one box collect all the data and store the RRD files would be
great.
> Well, this was a bad idea...the box (a P4 with 2GB RAM) was absolutely
> crushed...load shot up to 8.5, and all the graphs continually had gaps in
> them.
> 
> So my next attempt to was to install gmetad on each of the "master" nodes.
> I would have this gmetad collect data for the subnet, and then run another
> gmetad on my Ganglia web machine to just talk to these 16 other gmetads.
I
> don't really like having to now backup 16 machines, but I've had problems
> before with trying to store RRD files on an NFS mount, so I decided not to
> try that.  This isn't working all that great, either...the gmetad on these
> "master" nodes (collecting data from ~200 hosts each) is also causing a
> pretty high load...the boxes now stay around 2-3 load points all the time
> and sometimes slows down other operations on the box.
> 
> Am I doing something wrong, or is gmetad really this much of a resource
hog?
> Anyone else trying to use Ganglia to monitor 3000 machines?  Am I asking
too
> much?  Thanks for any insight.
> 
> Steve Gilbert
> Unix Systems Administrator
> [EMAIL PROTECTED]
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 

Reply via email to