Re: [Ganglia-general] performance concerns

Steven Wagner Thu, 16 Jan 2003 15:07:32 -0800

Nicholas Henke wrote:

On Mon, 23 Dec 2002 18:40:26 -0500
Nicholas Henke <[EMAIL PROTECTED]> wrote:

Hello-- 
        I have installed ganglia on several of our clusters, but it
        would seem
that the multi-cast channel is pumping a ton of data. On a 96 node
cluster, I am seeing around 20-30Kb/s average data transfer on the
multicast channel ( seen via ntop ). I am _very_ interested in
reducing this, otherwise we will have to remove ganglia. I have seen
the 'deaf' and 'mute' options in gmond.conf, but have not seen a
difference while using them in traffic patterns.

Is there any way to disable the multicast channel, and just have each
node reports stats to one monitoring server ?



Any updates to this? I have had to dump the use of ganglia on our
clusters until we can get around this. It is killing MPI and NFS
latencies.

Nic

In the absence of a useful response (I started to draft one when you firstposted, but figured it wasn't productive), I can provide you with anecdotalevidence of Ganglia running on five times that number of sources and thedata rate (according to ntop) is around 100Kb/sec (~200-250 packets persecond). That's an average packet size of around 60 bytes.

Or, to put it another way, it's 0.1% of the bandwidth on a fully-switchedFast Ethernet network (assuming blue-sky conditions). The latency pointis, of course, well-taken. The packet rate is determined by the multicastthresholds (they're compiled-in, but you can change them in the header fileand recompile the monitoring core, hopefully making it better-behaved).

Another option (which it may or may not be absurd for me to even mention,depending on how much hardware you have lying around) is not to run Gangliaon your production (messaging) interface, but on an administrativeinterface instead. Hey, some people have dual on-board FE NICs on alltheir nodes... of course it's silly to do this just for Ganglia but thereare other benefits to having an administrative interface...

Multicast is the only reporting method currently implemented in the Gangliamonitoring core. I don't know if Ganglia 3's framework will besufficiently open to allow adding new reporting methods, or even if there'sa significant demand for them (lurkers, speak up now?).

And it may well be that Ganglia's not the right tool for this particularjob. An SNMP implementation may serve you better.

Hey, I wonder what would happen if someone specified a non-multicast IP(running gmond, of course) as the target multicast network... anyone evertry that? Even if this worked, only the "target" monitoring core wouldhave all the data (helloooooo, central point of failure), but it seems tomeet your requirements. Or someone could write an "aggregate update" modto the monitoring core that requires it to meet a minimum buffer thresholdbefore sending updates. Or perhaps there's just one or two misbehavingmetrics (run a couple of the monitoring cores on a separate multicast IP indebug mode and see what metrics pop up the most often...).

Anyway, I know the above comments are rather useless but if it keeps yourproblem in the list's consciousness it can't be all bad. ;)

Re: [Ganglia-general] performance concerns

Reply via email to