Hi all, I'm looking at deploying ganglia across an installation of a few hundred machines, but I have a query with how the grouping into 'clusters' works.
I need to be able to monitor various sets of machines (compute farms, disk farms, NIS, tape robot) independently, but I like the redundancy provided by the data pools built by gmonds within the same multicast group. In particular, I want to monitor, say, a small tape robot system alongside a huge compute farm, and I'd like the metrics for the robot pooled on a good number of machines, not just on the small number (possibly one) of robot machines. There's no reason to think this wouldn't work simply enough, that I can find in the docs. However, it doesn't. Looking through the code, it seems the XML feed from a gmond always contains just one <CLUSTER> element, and the NAME and OWNER attributes are filled using the values on the host supplying the XML only. It seems as though there is possibly some intention for different behaviour at some point, since the DTD permits multiple CLUSTER elements. Does anybody know what the plans here might be? I'd be grateful if anybody could point out if I'm overlooking something, or if there's a better way of doing what I want. I could put each group I want to monitor onto a different multicast channel, but then I lose some redundancy and gmetad has to do a lot more polling! Many thanks Phil

