good day all, 

i'm having a bit of trouble getting 3.1.2 running correctly on my cluster. its
an SLES10sp2 x86_64 cluster. i have gmetad and gmond running oh head node, and 
gmond running on nodes 1 and 2. the head node is multihomed as usual with eth0
the internal network:

viper:/etc/ganglia # netstat -ar
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
239.2.11.71     *               255.255.255.255 UH        0 0          0 eth0
xxx.xxx.xxx.xxx *               255.255.252.0   U         0 0          0 eth1
link-local      *               255.255.0.0     U         0 0          0 eth0
172.16.0.0      *               255.255.0.0     U         0 0          0 eth0
172.17.0.0      *               255.255.0.0     U         0 0          0 ib0
172.17.0.0      *               255.255.0.0     U         0 0          0 ib1
loopback        *               255.0.0.0       U         0 0          0 lo
default         defaultrouter   0.0.0.0         UG        0 0          0 eth1

and i've fixed gmond to eth0 on the head node with:

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  mcast_if=eth0
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_if=eth0
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}

i've left the gmond.conf the default i made with gmond -t other than changing
the name of the cluster in the cluster container:

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "Viper Cluster"
  owner = "mgx"
  latlong = "unspecified"
  url = "unspecified"
}


the head node and nodes all report only themselves with gstat

node001:~ # gstat -a
CLUSTER INFORMATION
       Name: Viper Cluster
      Hosts: 1
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Wed Sep 30 17:24:19 2009

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]

node001
    8 (    0/  130) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.3,  99.6,   0.0] 
OFF

node002:~ # gstat -a
CLUSTER INFORMATION
       Name: Viper Cluster
      Hosts: 1
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Wed Sep 30 17:24:32 2009

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]

node002
    8 (    0/  131) [  0.00,  0.01,  0.00] [   0.0,   0.0,   0.3,  99.7,   0.0] 
OFF

viper:/etc/ganglia # gstat -a
CLUSTER INFORMATION
       Name: Viper Cluster
      Hosts: 1
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Wed Sep 30 17:21:08 2009

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]

viper
    4 (    1/  259) [  0.04,  0.06,  0.04] [   0.9,   0.1,   1.4,  96.3,   1.4] 
OFF
viper:/etc/ganglia # gstat -a
CLUSTER INFORMATION
       Name: Viper Cluster
      Hosts: 1
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Wed Sep 30 17:24:50 2009

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]

viper
    4 (    0/  259) [  0.04,  0.06,  0.04] [   0.1,   0.0,   0.9,  98.3,   0.6] 
OFF

the network on the nodes looks like this:

node001:~ # netstat -ar
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
link-local      *               255.255.0.0     U         0 0          0 eth0
172.16.0.0      *               255.255.0.0     U         0 0          0 eth0
172.17.0.0      *               255.255.0.0     U         0 0          0 ib0
loopback        *               255.0.0.0       U         0 0          0 lo
default         viper           0.0.0.0         UG        0 0          0 eth0

i'm clearly missing something simple, but i sure cannot see what it is at this 
point,
any help greatly appreciated.

-- michael


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to