Re: [Ganglia-general] Pointers on architecting a large scale ganglia setup??

Rick Mohr Fri, 27 Jan 2006 12:02:49 -0800

It's not so much that I "fear" multicast, it's just that I see no need for it.Admittedly, my setup is not necessarily like others, and therefore mypreferences don't necessarily apply to other setups. And I wouldn't claim themy setup is the "best" setup by any means. But if you are interested, I willtry to explain my personal resoning for not using multicast. (I have replied toparts of your previous email below.)

Like Rick said below, the packets are small (60 bytes) and each node sendsonly about 20/minute on average. Do the math, even with a large cluster youare talking about a tiny fraction of the network's capacity.

This is not the whole picture (more on this below). But even if it is a smallamount, it's extra traffic that I do not need. At best, it is unnoticable andserves no purpose. (So why keep it?) At worst, the UDP traffic intereferes withmy NFS traffic and results in extra RPC retransmits.

For us, redundancy is important because we have many clusters and do not wantto single out one node in each cluster to make it a special monitoring nodethat needs to be up 24/7 in order to collect ganglia monitoring data.

I agree which is why our monitoring node is not part of any cluster. It is aseparate machine that provides other monitoring services, so it is expected tobe up 24x7 anyway. Besides, I can easily get redundancy by sending unicast totwo or three hosts. I don't need 400 copies of my cluster's data. And even ifI did have that many copies, when I modify gmetad.conf, I certainly don't planto enter 400 host names as alternative sources for gmond info.

With multicast's redundancy, we won't have to worry about the possibilitythat the one special node out of our 400+ node cluster has crashed takingganglia down with it. If the node gmetad is currently getting data from has aproblem, it will transparently switch to another.

I don't worry about redundancy either, but this also fails to address theavailability of the gmetad process. Ganglia isn't very useful if the gmetadserver goes down and it is not recording the metrics to disk. Since I prettymuch need to make sure that gmetad is running 24x7 anyway, why not let itcollect gmond info? The server I run gmetad on is better equipped foravailability than any of the compute nodes in the cluster.

Our largest cluster has 427 linux nodes with a basic gmond using all of
the default metrics.  I ran a tcpdump for about 10 minutes to capture
all of the multicast data in that cluster and here is what I found (*):

5,256,772 bytes collected in 628.45 seconds from 87,582 packets.

In other words, the average rate was:

139.362 packets/second
8364.668 bytes/second
0.066 Mbits/second

Thanks for supplying this data. It is a good baseline for people to use whenplanning their Ganglia deployments. However, I add a lot more metrics than thedefault. I estimate that I get about 500 packets/second which translates toabout 30,000 bytes/sec. Plus, this "background noise" scales linearly with thenumber of nodes in the cluster.

I should also mention that the 60 byte metric packet actually consumes more than60 bytes in the linux socket receive buffer. On my system, I ran gmond and thensent it the STOP signal. I then used gmetric to submit a single metric by hand.By looking at the rx_queue column in /proc/net/udp, I saw that the one metricactually took up 304 bytes. The buffer fills up pretty quick and UDP packetsstart dropping when the number of metrics gets big.

Then you also have to consider the bandwidth used by gmetad when it contactsgmond. I tested having gmetad contact one of my cluster nodes to get thecluster info. My test showed that it took about 0.7 secs for it to retreiveabout 2.5 MB of data. That's 28.5 Mbps of the node's bandwidth used every 15seconds. (In a previous post I also mentioned how I saw increased UDP droprates whenever the gmetad contacted a gmond.)

So those are my reasons for choosing unicast. Plenty of possibility forproblems, very little benefit. Other people's mileage may vary.


-- Rick

--------------------------
Rick Mohr
Systems Developer
Ohio Supercomputer Center

Re: [Ganglia-general] Pointers on architecting a large scale ganglia setup??

Reply via email to