Nicholas Henke wrote:
On Mon, 23 Dec 2002 18:40:26 -0500
Nicholas Henke <[EMAIL PROTECTED]> wrote:


Hello-- 
        I have installed ganglia on several of our clusters, but it
        would seem
that the multi-cast channel is pumping a ton of data. On a 96 node
cluster, I am seeing around 20-30Kb/s average data transfer on the
multicast channel ( seen via ntop ). I am _very_ interested in
reducing this, otherwise we will have to remove ganglia. I have seen
the 'deaf' and 'mute' options in gmond.conf, but have not seen a
difference while using them in traffic patterns.

Is there any way to disable the multicast channel, and just have each
node reports stats to one monitoring server ?


Any updates to this? I have had to dump the use of ganglia on our
clusters until we can get around this. It is killing MPI and NFS
latencies.

Nic

In the absence of a useful response (I started to draft one when you first posted, but figured it wasn't productive), I can provide you with anecdotal evidence of Ganglia running on five times that number of sources and the data rate (according to ntop) is around 100Kb/sec (~200-250 packets per second). That's an average packet size of around 60 bytes.

Or, to put it another way, it's 0.1% of the bandwidth on a fully-switched Fast Ethernet network (assuming blue-sky conditions). The latency point is, of course, well-taken. The packet rate is determined by the multicast thresholds (they're compiled-in, but you can change them in the header file and recompile the monitoring core, hopefully making it better-behaved).

Another option (which it may or may not be absurd for me to even mention, depending on how much hardware you have lying around) is not to run Ganglia on your production (messaging) interface, but on an administrative interface instead. Hey, some people have dual on-board FE NICs on all their nodes... of course it's silly to do this just for Ganglia but there are other benefits to having an administrative interface...

Multicast is the only reporting method currently implemented in the Ganglia monitoring core. I don't know if Ganglia 3's framework will be sufficiently open to allow adding new reporting methods, or even if there's a significant demand for them (lurkers, speak up now?).

And it may well be that Ganglia's not the right tool for this particular job. An SNMP implementation may serve you better.

Hey, I wonder what would happen if someone specified a non-multicast IP (running gmond, of course) as the target multicast network... anyone ever try that? Even if this worked, only the "target" monitoring core would have all the data (helloooooo, central point of failure), but it seems to meet your requirements. Or someone could write an "aggregate update" mod to the monitoring core that requires it to meet a minimum buffer threshold before sending updates. Or perhaps there's just one or two misbehaving metrics (run a couple of the monitoring cores on a separate multicast IP in debug mode and see what metrics pop up the most often...).

Anyway, I know the above comments are rather useless but if it keeps your problem in the list's consciousness it can't be all bad. ;)


Reply via email to