Hello all,
I'm hoping someone can point me in the right direction. After an
upgrade to Red Hat EL 5.6 earlier this week, my gmon collector service
is only showing the localhost and none of the gmon multicast traffic
from the other nodes. I can see the multicast traffic getting to this
server, but 'gstat -a' lists nothing but itself.
It's quite bizarre because my ganglia nodes didn't disappear until
about 24 hours after the upgrade had been completed. I'm at a loss as to
what could have happened. I've verified that selinux is disabled and the
issue persists. IPtables have been disabled (just in case) and it
persists. The gmond.conf file under /etc/ganglia/gmond.conf was copied
back directly from backup *and* known working nodes. Note: the
non-upgraded nodes list all other nodes with a 'gstat -a' correctly.
I'm really at a loss here. Any pointers on where to look are
appreciated. I'm hoping it's something simple I'm stupidly overlooking.
Relevant info below. Thanks in advance!
--gmon multicast traffic is making it to this gmon collector
# tcpdump -i any ip multicast
tcpdump: WARNING: Promiscuous mode not supported on the "any" device
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 96
bytes
11:31:16.997129 IP mgt1.36677 > 239.2.11.71.8649: UDP, length 44
11:31:22.913169 IP node011.45309 > 239.2.11.71.8649: UDP, length 52
11:31:22.913177 IP node011.45309 > 239.2.11.71.8649: UDP, length 52
11:31:22.913279 IP node006.50998 > 239.2.11.71.8649: UDP, length 52
11:31:22.913285 IP node006.50998 > 239.2.11.71.8649: UDP, length 52
11:31:22.913503 IP node004.48911 > 239.2.11.71.8649: UDP, length 52
11:31:22.913511 IP node004.48911 > 239.2.11.71.8649: UDP, length 52
11:31:22.916303 IP node005.56330 > 239.2.11.71.8649: UDP, length 48
11:31:22.918336 IP node006.50998 > 239.2.11.71.8649: UDP, length 48
*snip*
--But it's not seeing any other nodes besides itself
# gstat -a
CLUSTER INFORMATION
Name: Cluster
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Thu Jul 14 11:32:29 2011
CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]
mgn2
8 ( 0/ 618) [ 0.15, 0.36, 0.48] [ 0.6, 0.0, 0.2,
98.8, 0.4] OFF
--gmetad.conf
# cat gmetad.conf |grep -v "#"
data_source "Cluster" 10 localhost
all_trusted on
--gmond.conf after I've added in receive channels for *every*
interface. This wasn't in the original working config, but I was trying
anything at this point.
# cat gmond.conf
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 2
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 0 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 0 /*secs */
}
/*
* The cluster attributes specified will be used as part of the
<CLUSTER>
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "Cluster"
owner = "n/a"
latlong = "n/a"
url = "n/a"
}
/* The host section describes attributes of the host, like the location
*/
host {
location = "n/a"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
ttl = 1
mcast_if=eth0
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
mcast_if=eth0
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
mcast_if=eth0:1
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
mcast_if=eth1
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
mcast_if=eth2
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
mcast_if=eth3
}
*snip modules*
--debug start-up output
# service gmond restart
Shutting down GANGLIA gmond: [ OK ]
Starting GANGLIA gmond: loaded module: core_metrics
loaded module: cpu_module
loaded module: disk_module
loaded module: load_module
loaded module: mem_module
loaded module: net_module
loaded module: proc_module
loaded module: sys_module
udp_recv_channel mcast_join=239.2.11.71 mcast_if=eth0 port=8649
bind=239.2.11.71
udp_recv_channel mcast_join=239.2.11.71 mcast_if=eth0:1 port=8649
bind=239.2.11.71
udp_recv_channel mcast_join=239.2.11.71 mcast_if=eth1 port=8649
bind=239.2.11.71
udp_recv_channel mcast_join=239.2.11.71 mcast_if=eth2 port=8649
bind=239.2.11.71
udp_recv_channel mcast_join=239.2.11.71 mcast_if=eth3 port=8649
bind=239.2.11.71
tcp_accept_channel bind=NULL port=8649
udp_send_channel mcast_join=239.2.11.71 mcast_if=eth0 host=NULL
port=8649
metric 'cpu_user' being collected now
metric 'cpu_user' has value_threshold 1.000000
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general