Phil Forrest wrote:
Hello All,
Once upon a time, I had a happy ganglia monitor that was giving me
valuable data on all nodes of my 48 node cluster. Then I got a request
from a user to upgrade the kernel. After I upgraded the kernels across
the cluster, my ganglia could only see the data from the gmond running
on the head node (which also had gmetad and httpd running).
The cluster is running Red Hat 7.3 with kernel 2.4.9-34smp #1 SMP Sat
Jun 1 05:54:57 EDT 2002 i686 unknown
My cluster has 46 compute nodes with one (eth0) interface and two head
nodes with two interfaces (eth0 and eth1) one for the private lan and
one for the campus network. My head node that has gmetad running has
"mcast_if eth1" set in its gmond.conf file. Here's the /sbin/ifconfig
slice for eth1 on the head node:
eth1 Link encap:Ethernet HWaddr 00:40:F4:2A:6E:26
inet addr:192.168.5.200 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:176581970 errors:0 dropped:0 overruns:0 frame:0
TX packets:160905314 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
RX bytes:1187468116 (1132.4 Mb) TX bytes:2350492219 (2241.6 Mb)
Can I trust the output of /sbin/ifconfig (meaning, if /sbin/ifconfig
says MULTICAST is running, is that the REAL truth, or can the kernel
still suppress multicast transmissions??)
The kernel's firewalling configuration can still filter out multicast
traffic. Check your firewall config (man iptables :) ). If your config is
very restrictive, at least poke a li'l hole for the multicast IP/port combo.
IIRC, the default iptables behavior changed a few point releases back in
Redhat - it's now on. This is apparently to help everyone who's installing
it on their desktop connected to the net via cable modem from getting owned...
Also, gmetad cares not one whit about /etc/gmond.conf. I just did a
once-over on the code to make absolutely sure, there's no mention of it.
It's /etc/gmetad.conf that you should concern yourself with on the head
units if you're having display problems. Unless they're also supposed to
be part of the cluster, in which case you would configure the gmonds
separately.
Remember to open firewall ports for TCP port 8649 on hosts running the
monitoring core and TCP port 8651 for the hosts running gmetad.
The metadaemon should be determining the path to establish its connections
via the good ol' fashioned kernel routing table, just like anything else.
As a test, I've been running gmond on one node in deaf debug mode, and
on another node in mute debug mode. The deaf one is pumping out data
successfully and the mute one is not seeing anything. Since this is
compute node to compute node, there can only be one interface (eth0).
There has to be something in the kernel config that is screwing this up.
That sounds like it's a firewall config issue or a router/switch config
issue to me...
I'm wondering with all the kernel upgrades going on out there, maybe
someone has had similar issues? Thanks in advance for any info!
7.2 / 2.4.19smp on most of our nodes here, no reported problems with the
monitoring core on any of them.
Happy Holidays To All,
-Phil Forrest
Yeah, happy Life Day, kids. ;)
Hope this info proves useful...