Phil Forrest wrote:
Hello All,

Once upon a time, I had a happy ganglia monitor that was giving me valuable data on all nodes of my 48 node cluster. Then I got a request from a user to upgrade the kernel. After I upgraded the kernels across the cluster, my ganglia could only see the data from the gmond running on the head node (which also had gmetad and httpd running).

The cluster is running Red Hat 7.3 with kernel 2.4.9-34smp #1 SMP Sat Jun 1 05:54:57 EDT 2002 i686 unknown

My cluster has 46 compute nodes with one (eth0) interface and two head nodes with two interfaces (eth0 and eth1) one for the private lan and one for the campus network. My head node that has gmetad running has "mcast_if eth1" set in its gmond.conf file. Here's the /sbin/ifconfig slice for eth1 on the head node:

eth1      Link encap:Ethernet  HWaddr 00:40:F4:2A:6E:26
          inet addr:192.168.5.200  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:176581970 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160905314 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0
          RX bytes:1187468116 (1132.4 Mb)  TX bytes:2350492219 (2241.6 Mb)

Can I trust the output of /sbin/ifconfig (meaning, if /sbin/ifconfig says MULTICAST is running, is that the REAL truth, or can the kernel still suppress multicast transmissions??)

The kernel's firewalling configuration can still filter out multicast traffic. Check your firewall config (man iptables :) ). If your config is very restrictive, at least poke a li'l hole for the multicast IP/port combo.

IIRC, the default iptables behavior changed a few point releases back in Redhat - it's now on. This is apparently to help everyone who's installing it on their desktop connected to the net via cable modem from getting owned...

Also, gmetad cares not one whit about /etc/gmond.conf. I just did a once-over on the code to make absolutely sure, there's no mention of it. It's /etc/gmetad.conf that you should concern yourself with on the head units if you're having display problems. Unless they're also supposed to be part of the cluster, in which case you would configure the gmonds separately.

Remember to open firewall ports for TCP port 8649 on hosts running the monitoring core and TCP port 8651 for the hosts running gmetad.

The metadaemon should be determining the path to establish its connections via the good ol' fashioned kernel routing table, just like anything else.

As a test, I've been running gmond on one node in deaf debug mode, and on another node in mute debug mode. The deaf one is pumping out data successfully and the mute one is not seeing anything. Since this is compute node to compute node, there can only be one interface (eth0). There has to be something in the kernel config that is screwing this up.

That sounds like it's a firewall config issue or a router/switch config issue to me...

I'm wondering with all the kernel upgrades going on out there, maybe someone has had similar issues? Thanks in advance for any info!

7.2 / 2.4.19smp on most of our nodes here, no reported problems with the monitoring core on any of them.

Happy Holidays To All,
-Phil Forrest

Yeah, happy Life Day, kids. ;)

Hope this info proves useful...


Reply via email to