Hi all,

I'm looking at deploying ganglia across an installation of a few hundred 
machines, but I have a query with how the grouping into 'clusters' works.

I need to be able to monitor various sets of machines (compute farms, disk 
farms, NIS, tape robot) independently, but I like the redundancy provided 
by the data pools built by gmonds within the same multicast group.  In 
particular, I want to monitor, say, a small tape robot system alongside a 
huge compute farm, and I'd like the metrics for the robot pooled on a 
good number of machines, not just on the small number (possibly one) of 
robot machines.

There's no reason to think this wouldn't work simply enough, that I can 
find in the docs.  However, it doesn't.  Looking through the code, it 
seems the XML feed from a gmond always contains just one <CLUSTER> 
element, and the NAME and OWNER attributes are filled using the values on 
the host supplying the XML only.

It seems as though there is possibly some intention for different 
behaviour at some point, since the DTD permits multiple CLUSTER elements.
Does anybody know what the plans here might be?

I'd be grateful if anybody could point out if I'm overlooking something, 
or if there's a better way of doing what I want.  I could put each group I 
want to monitor onto a different multicast channel, but then I lose some 
redundancy and gmetad has to do a lot more polling!

Many thanks
Phil


Reply via email to