What sort of network switch do you have for the internal cluster network? Some network switches do not have good out of the box support for multicast. If you have one of those switches you might want to consider going to a unicast setup. This only takes a few small alterations to your gmond.conf on each node.
The other option is to search through the switch configuration assuming it is a manageable switch and play with some of the configuration options. -- Steven DuChene Senior HPC Technical Architect -----Original Message----- >From: Michael Galloway <[email protected]> >Sent: Sep 30, 2009 5:27 PM >To: [email protected] >Subject: [Ganglia-general] 3.1.2 help with SLES10sp2 install > >good day all, > >i'm having a bit of trouble getting 3.1.2 running correctly on my cluster. its >an SLES10sp2 x86_64 cluster. i have gmetad and gmond running oh head node, and >gmond running on nodes 1 and 2. the head node is multihomed as usual with eth0 >the internal network: > >viper:/etc/ganglia # netstat -ar >Kernel IP routing table >Destination Gateway Genmask Flags MSS Window irtt Iface >239.2.11.71 * 255.255.255.255 UH 0 0 0 eth0 >xxx.xxx.xxx.xxx * 255.255.252.0 U 0 0 0 eth1 >link-local * 255.255.0.0 U 0 0 0 eth0 >172.16.0.0 * 255.255.0.0 U 0 0 0 eth0 >172.17.0.0 * 255.255.0.0 U 0 0 0 ib0 >172.17.0.0 * 255.255.0.0 U 0 0 0 ib1 >loopback * 255.0.0.0 U 0 0 0 lo >default defaultrouter 0.0.0.0 UG 0 0 0 eth1 > >and i've fixed gmond to eth0 on the head node with: > >/* Feel free to specify as many udp_send_channels as you like. Gmond > used to only support having a single channel */ >udp_send_channel { > mcast_if=eth0 > mcast_join = 239.2.11.71 > port = 8649 > ttl = 1 >} > >/* You can specify as many udp_recv_channels as you like as well. */ >udp_recv_channel { > mcast_if=eth0 > mcast_join = 239.2.11.71 > port = 8649 > bind = 239.2.11.71 >} > >i've left the gmond.conf the default i made with gmond -t other than changing >the name of the cluster in the cluster container: > >/* > * The cluster attributes specified will be used as part of the <CLUSTER> > * tag that will wrap all hosts collected by this instance. > */ >cluster { > name = "Viper Cluster" > owner = "mgx" > latlong = "unspecified" > url = "unspecified" >} > > >the head node and nodes all report only themselves with gstat > >node001:~ # gstat -a >CLUSTER INFORMATION > Name: Viper Cluster > Hosts: 1 >Gexec Hosts: 0 > Dead Hosts: 0 > Localtime: Wed Sep 30 17:24:19 2009 > >CLUSTER HOSTS >Hostname LOAD CPU Gexec > CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio] > >node001 > 8 ( 0/ 130) [ 0.00, 0.00, 0.00] [ 0.0, 0.0, 0.3, 99.6, > 0.0] OFF > >node002:~ # gstat -a >CLUSTER INFORMATION > Name: Viper Cluster > Hosts: 1 >Gexec Hosts: 0 > Dead Hosts: 0 > Localtime: Wed Sep 30 17:24:32 2009 > >CLUSTER HOSTS >Hostname LOAD CPU Gexec > CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio] > >node002 > 8 ( 0/ 131) [ 0.00, 0.01, 0.00] [ 0.0, 0.0, 0.3, 99.7, > 0.0] OFF > >viper:/etc/ganglia # gstat -a >CLUSTER INFORMATION > Name: Viper Cluster > Hosts: 1 >Gexec Hosts: 0 > Dead Hosts: 0 > Localtime: Wed Sep 30 17:21:08 2009 > >CLUSTER HOSTS >Hostname LOAD CPU Gexec > CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio] > >viper > 4 ( 1/ 259) [ 0.04, 0.06, 0.04] [ 0.9, 0.1, 1.4, 96.3, > 1.4] OFF >viper:/etc/ganglia # gstat -a >CLUSTER INFORMATION > Name: Viper Cluster > Hosts: 1 >Gexec Hosts: 0 > Dead Hosts: 0 > Localtime: Wed Sep 30 17:24:50 2009 > >CLUSTER HOSTS >Hostname LOAD CPU Gexec > CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio] > >viper > 4 ( 0/ 259) [ 0.04, 0.06, 0.04] [ 0.1, 0.0, 0.9, 98.3, > 0.6] OFF > >the network on the nodes looks like this: > >node001:~ # netstat -ar >Kernel IP routing table >Destination Gateway Genmask Flags MSS Window irtt Iface >link-local * 255.255.0.0 U 0 0 0 eth0 >172.16.0.0 * 255.255.0.0 U 0 0 0 eth0 >172.17.0.0 * 255.255.0.0 U 0 0 0 ib0 >loopback * 255.0.0.0 U 0 0 0 lo >default viper 0.0.0.0 UG 0 0 0 eth0 > >i'm clearly missing something simple, but i sure cannot see what it is at this >point, >any help greatly appreciated. > >-- michael > > >------------------------------------------------------------------------------ >Come build with us! The BlackBerry® Developer Conference in SF, CA >is the only developer event you need to attend this year. Jumpstart your >developing skills, take BlackBerry mobile applications to market and stay >ahead of the curve. Join us from November 9-12, 2009. Register now! >http://p.sf.net/sfu/devconf >_______________________________________________ >Ganglia-general mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/ganglia-general ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

