Mr. Dibowitz,
I am unclear as to why you have "all of the boxes...unicast their data
to the ganglia server". Unless you want data being duplicated, this is a
bad idea. I am not sure what you mean by classes or "ganglia setup is a
cluster". I think you should probably set it up with what you call a
class be a cluster instead. You can have multiple clusters under one
gmetad. The gmetad.conf on the 'ganglia server' should look like this IMHO:
data_source "foo" foo1 foo2
data_source "bar" bar1 bar2
If you want to collect metrics for the 'ganglia server' as well, that is
fine, just don't send the local gmond stuff from foo to the local gmond.
It sounds like you want to have clusters of clusters, and to do that you
need to use grids which is a slightly more complicated ball of wax.
Cheers,
Ian
Phil Dibowitz wrote:
Hey folks,
We have 5 identical (but separate) ganglia setups. All running gmond/gmetad
3.0.2. All but one of these setups have a common problem: they only write
data to data-source RRDs and not the global cluster RRDs.
The setups are fairly simple:
- each ganglia setup is a cluster
- each cluster is divided into classes
- all of the boxes in each class _unicast_ their data to the 1st and 2nd
instance of their class (foo1,foo2,foo3,foo4..fooN, unicase to foo1 and foo2)
- all of the boxes also _unicast_ their data to the 'ganglia server' (which
is running gmetad, gmond, and the ganglia-web stuff)
- gmetad has a data_source setup for each cluster where it pulls from the
first intance of that class (and if that fails the second isntance). For
example:
data_source "foo" foo1 foo2
And a 'misc' class which it pulls from itself:
data_source "misc" localhost
Here comes the problem. For any given host - lets say, foo4 - it writes the
data to the RRD in foo/foo4 but NOT in the cluster directory (we'll call it
'cluster1' - each one is different), cluster1/foo4/. The RRDs are in
cluster5/foo4, but they have nothing but NaN's in them.
If you telnet to localhost 8649 you cans ee all the data unter CLUSTER
NAME="cluster1"... the data *is* there. It just only gets written under the
data_source directory and not under the global directory.
I've started up gmetad with debug9 and I see no problems.
I've tried dropping down to only one data_source (which housed 4 boxes).
I'm out of ideas. Any ideas would be helpful. Thanks.