Igor Rosenberg wrote:
Hello there,I've decided to use ganglia to provide the monitoring framework for my application (running over globus GT4). I've been a bit distressed not to be able to locate the federated documentation of ganglia. Where is the*up-to-date* information gathered? Is there more than the following:
I had to figure this out recently as well. I don't have any magic docs to offer, but here are something that I discovered that weren't quite so obvious. Also, this might not even be correct, but it works for me (e.g. there may be a more elegant solution).
Any corrections are welcome.
The short version is this:
1) In gmond.conf, set the 'cluster { name="Cluster_Name" }'
2) Make sure that gmond within each cluster can talk via
unicast or multicast.
2) Define the 'gridname' setting in gmetad.conf
3) Configure gmetad to poll all of the gmond and gmetad processes that
collect data for your clusters. Make sure that the Cluster_Name
matches what was set in gmond.conf. The format is:
data_source "Cluster_Name" 10.10.10.10:8649
4) Start gmond.
5) Start gmetad.
Here's a longer example...
I have one cluster that I broke up into two parts: a "compute" cluster with
the compute nodes, and a "storage" cluster that tracks the various NFS
servers. The compute cluster is named "GC_Cluster," and the storage cluster
is, originally enough, "GC_Storage." There is a single master node for the
whole system, and both CG_* ganglia clusters talk to it.
So far as I can tell, each federation (i.e. GC_Cluster or GC_Storage) needs to at least have a different cluster name. I also send traffic for the different cluster on different UDP ports, although I do not know if this is strictly necessary. I use a combination of multicast and unicast.
On the master node, gmond.conf has these entries:
<---------------------->
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649 /* cluster */
bind = 239.2.11.71
}
udp_recv_channel {
port = 8649 /* cluster */
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8648 /* storage */
bind = 239.2.11.71
}
tcp_accept_channel { /* for gmetad */
port = 8649
}
<---------------------->
Most of the GC_Cluster boxes use multicast to send data between gmond , but
some also use unicast. This all happens on port 8649, and (I think) requires
two separate entries for udp_recv_channel {}.
The compute nodes have this configuration (abbreviated):
<---------------------->
cluster {
name = "GC_Nodes" /* Define which cluster to which I belong */
}
udp_send_channel { /* send data via multicast */
mcast_join = 239.2.11.71
port = 8649
}
udp_recv_channel { /* accept data from multicast */
mcast_join = 239.2.11.71
bind = 239.2.11.71
}
tcp_accept_channel { /* for gmetad */
port = 8649
}
<---------------------->
The storage cluster is a bit different. Only unicast is used, and gmond
should use port 8648. The data is all sent directly to one of the storage
nodes, and the same gmond.conf is used on all hosts:
<---------------------->
cluster {
name = "GC_Storage" /* different ganglia cluster! */
}
udp_send_channel {
host = storage-1
port = 8648
}
tcp_accept_channel {
port = 8649 /* and still allow gmetad to connect */
}
<---------------------->
So that's the gmond configuration. The gmetad configration is pretty simple:
data_source "GC_Nodes" localhost:8649
data_source "GC_Storage" storage-1:8649
Note that port 8649 is used. This is okay, because it is a TCP port, not a
UDP port, so there's no conflict in that regard.
This might be simpler with a picture: http://image.bayimg.com/jaicnaabo.jpgWhen making the image, I realized a few things about my configuration that aren't 100% perfect: gmond on the master nodes doesn't need to listen on 239.2.11.71:8648 for example, nor does gmond on storage-1 need to send updates to itself. I should probably fix those.
One thing I am not sure about is is you can have multiple clusters reporting on the same ports. In my configuration, I explicitly separated the two across UDP ports 8648 and 8649. It seems that the XML generated by gmond *is* organized by <CLUSTER NAME> tag, so it looks like this is supported... I'm not using it that way though (and probably making the configuration more complicated because of it), so I can't say for certain.
-- Jesse Becker NHGRI Linux support (Digicon Contractor)
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

