It's not so much that I "fear" multicast, it's just that I see no need for it. Admittedly, my setup is not necessarily like others, and therefore my preferences don't necessarily apply to other setups. And I wouldn't claim the my setup is the "best" setup by any means. But if you are interested, I will try to explain my personal resoning for not using multicast. (I have replied to parts of your previous email below.)

Like Rick said below, the packets are small (60 bytes) and each node sends only about 20/minute on average. Do the math, even with a large cluster you are talking about a tiny fraction of the network's capacity.

This is not the whole picture (more on this below). But even if it is a small amount, it's extra traffic that I do not need. At best, it is unnoticable and serves no purpose. (So why keep it?) At worst, the UDP traffic intereferes with my NFS traffic and results in extra RPC retransmits.

For us, redundancy is important because we have many clusters and do not want to single out one node in each cluster to make it a special monitoring node that needs to be up 24/7 in order to collect ganglia monitoring data.

I agree which is why our monitoring node is not part of any cluster. It is a separate machine that provides other monitoring services, so it is expected to be up 24x7 anyway. Besides, I can easily get redundancy by sending unicast to two or three hosts. I don't need 400 copies of my cluster's data. And even if I did have that many copies, when I modify gmetad.conf, I certainly don't plan to enter 400 host names as alternative sources for gmond info.

With multicast's redundancy, we won't have to worry about the possibility that the one special node out of our 400+ node cluster has crashed taking ganglia down with it. If the node gmetad is currently getting data from has a problem, it will transparently switch to another.

I don't worry about redundancy either, but this also fails to address the availability of the gmetad process. Ganglia isn't very useful if the gmetad server goes down and it is not recording the metrics to disk. Since I pretty much need to make sure that gmetad is running 24x7 anyway, why not let it collect gmond info? The server I run gmetad on is better equipped for availability than any of the compute nodes in the cluster.

Our largest cluster has 427 linux nodes with a basic gmond using all of
the default metrics.  I ran a tcpdump for about 10 minutes to capture
all of the multicast data in that cluster and here is what I found (*):

5,256,772 bytes collected in 628.45 seconds from 87,582 packets.

In other words, the average rate was:

139.362 packets/second
8364.668 bytes/second
0.066 Mbits/second

Thanks for supplying this data. It is a good baseline for people to use when planning their Ganglia deployments. However, I add a lot more metrics than the default. I estimate that I get about 500 packets/second which translates to about 30,000 bytes/sec. Plus, this "background noise" scales linearly with the number of nodes in the cluster.

I should also mention that the 60 byte metric packet actually consumes more than 60 bytes in the linux socket receive buffer. On my system, I ran gmond and then sent it the STOP signal. I then used gmetric to submit a single metric by hand. By looking at the rx_queue column in /proc/net/udp, I saw that the one metric actually took up 304 bytes. The buffer fills up pretty quick and UDP packets start dropping when the number of metrics gets big.

Then you also have to consider the bandwidth used by gmetad when it contacts gmond. I tested having gmetad contact one of my cluster nodes to get the cluster info. My test showed that it took about 0.7 secs for it to retreive about 2.5 MB of data. That's 28.5 Mbps of the node's bandwidth used every 15 seconds. (In a previous post I also mentioned how I saw increased UDP drop rates whenever the gmetad contacted a gmond.)

So those are my reasons for choosing unicast. Plenty of possibility for problems, very little benefit. Other people's mileage may vary.

-- Rick

--------------------------
Rick Mohr
Systems Developer
Ohio Supercomputer Center

Reply via email to