[Ganglia-general] Setup large clusters

Martin Hicks Wed, 12 Mar 2008 08:35:40 -0700

Hi,

I'm wondering what the suggested setup is for a large Grid.  I'm having
trouble with scalaing ganglia to work on large clusters.


Consider the following:

- Pretty well default gmond.conf distributed throughout all cluster
  members.

- 20 clusters of 64 nodes.  Gmond running on each cluster node, plus on
  the cluster head node

- gmond and gmetad running on the "admin" node, which has the Grid
  defined in gmetad, and polls the information from each of the cluster
  head nodes.

The configuration of gmetad has been modified to store the rrds in
/dev/shm, but this directory gets very large so I'd like to move away
from that.

Switching the rrd directory back to the default breaks.  As soon as
gmetad gets through its first round of grabbing metrics from the head
nodes, the machine starts writing a lot of small updates to disk,
completely consuming the machine:

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0  17140 641292 453944 2009608    0    0     0     0 5092 20394 19 27 55  0 
 0
 3  0  17140 632364 453952 2009600    0    0     0     0 6358 17128 24 32 44  0 
 0
 1  0  17140 631744 453952 2009600    0    0     0     0 2579 7545  7 11 82  0  0
 0  0  17140 629264 453952 2009600    0    0     0     0 2099 11337  7  7 86  0 
 0
 0  0  17140 629264 453952 2009600    0    0     0     0  351  855  0  0 100  0 
 0
 0  1  17140 629264 453952 2009600    0    0     0  3456  986  793  0  0 59 41  0
 0  1  17140 629264 453952 2009600    0    0     0  3332 1159  897  0  0 50 50  0
 0  1  17140 629280 453952 2009600    0    0     0  3072 1019  814  0  0 50 50  0
 0  1  17140 629280 453952 2009600    0    0     0  1792  771  886  0  0 50 50  0
 0  1  17140 629280 453952 2009600    0    0     0  1284  588  761  0  0 50 50  0
 0  2  17140 629280 453952 2009600    0    0     0  1536  676  890  0  0 38 61  0
 0  2  17140 629296 453952 2009600    0    0     0  1280  613  763  0  0 50 50  0
 0  2  17140 629296 453952 2009600    0    0     0  2048  825  887  0  0 50 50  0


forever more...

Is there a way that I should be architecting the configuration files
to make ganglia scale to work on this cluster?

I think I want to run gmetad on each head node, and to use that RRD data without
regenerating it on the admin node.  Is that possible?

Thanks,
mh


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Setup large clusters

Reply via email to