good day all,
i'm having a bit of trouble getting 3.1.2 running correctly on my cluster. its
an SLES10sp2 x86_64 cluster. i have gmetad and gmond running oh head node, and
gmond running on nodes 1 and 2. the head node is multihomed as usual with eth0
the internal network:
viper:/etc/ganglia # netstat -ar
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
239.2.11.71 * 255.255.255.255 UH 0 0 0 eth0
xxx.xxx.xxx.xxx * 255.255.252.0 U 0 0 0 eth1
link-local * 255.255.0.0 U 0 0 0 eth0
172.16.0.0 * 255.255.0.0 U 0 0 0 eth0
172.17.0.0 * 255.255.0.0 U 0 0 0 ib0
172.17.0.0 * 255.255.0.0 U 0 0 0 ib1
loopback * 255.0.0.0 U 0 0 0 lo
default defaultrouter 0.0.0.0 UG 0 0 0 eth1
and i've fixed gmond to eth0 on the head node with:
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
mcast_if=eth0
mcast_join = 239.2.11.71
port = 8649
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
mcast_if=eth0
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
i've left the gmond.conf the default i made with gmond -t other than changing
the name of the cluster in the cluster container:
/*
* The cluster attributes specified will be used as part of the <CLUSTER>
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "Viper Cluster"
owner = "mgx"
latlong = "unspecified"
url = "unspecified"
}
the head node and nodes all report only themselves with gstat
node001:~ # gstat -a
CLUSTER INFORMATION
Name: Viper Cluster
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Sep 30 17:24:19 2009
CLUSTER HOSTS
Hostname LOAD CPU Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio]
node001
8 ( 0/ 130) [ 0.00, 0.00, 0.00] [ 0.0, 0.0, 0.3, 99.6, 0.0]
OFF
node002:~ # gstat -a
CLUSTER INFORMATION
Name: Viper Cluster
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Sep 30 17:24:32 2009
CLUSTER HOSTS
Hostname LOAD CPU Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio]
node002
8 ( 0/ 131) [ 0.00, 0.01, 0.00] [ 0.0, 0.0, 0.3, 99.7, 0.0]
OFF
viper:/etc/ganglia # gstat -a
CLUSTER INFORMATION
Name: Viper Cluster
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Sep 30 17:21:08 2009
CLUSTER HOSTS
Hostname LOAD CPU Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio]
viper
4 ( 1/ 259) [ 0.04, 0.06, 0.04] [ 0.9, 0.1, 1.4, 96.3, 1.4]
OFF
viper:/etc/ganglia # gstat -a
CLUSTER INFORMATION
Name: Viper Cluster
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Sep 30 17:24:50 2009
CLUSTER HOSTS
Hostname LOAD CPU Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio]
viper
4 ( 0/ 259) [ 0.04, 0.06, 0.04] [ 0.1, 0.0, 0.9, 98.3, 0.6]
OFF
the network on the nodes looks like this:
node001:~ # netstat -ar
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
link-local * 255.255.0.0 U 0 0 0 eth0
172.16.0.0 * 255.255.0.0 U 0 0 0 eth0
172.17.0.0 * 255.255.0.0 U 0 0 0 ib0
loopback * 255.0.0.0 U 0 0 0 lo
default viper 0.0.0.0 UG 0 0 0 eth0
i'm clearly missing something simple, but i sure cannot see what it is at this
point,
any help greatly appreciated.
-- michael
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general