Re: [Ganglia-general] Kernel Upgrade killed my happy Ganglia

Phil Forrest Thu, 19 Dec 2002 12:07:16 -0800

Thanks for the info Steve...I think I've found some part of the problem. Ipointed gmetad to the gmond on node02 instead of the local gmond. Here'swhat I get:

(NOTE: prior to running this, node02 is running gmond in debug mode level 2and I am seeing data on node02 that gmond is putting to stdout)

[EMAIL PROTECTED] etc]# /usr/sbin/gmetad
Setting debug level to 2
Datasource = [Node02]
Trying to connect to 192.168.5.2:8649 for [Node02]
Data inserted for [Node02] into sources hash
Going to run as user nobody
Sources are ...
Source: [Node02] has 1 sources
        192.168.5.2
listening on port 8651
3076 is monitoring [Node02] data source
        192.168.5.2
save_to_rrd() XML_ParseBuffer() error at line 1:
no element found


data_thread() couldn't parse the XML and data to RRD for [Node02]
[Node02] is a dead source
save_to_rrd() XML_ParseBuffer() error at line 1:
no element found

Is this saying that there is no data coming from node02 or that it cannotsave to the RR database for some reason?

Additionally, I checked the firewall config on the internal compute nodes.I'm no expert on iptables, but it looks like there are NO rules:


[EMAIL PROTECTED] etc]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
[EMAIL PROTECTED] etc]# ipchains -L
ipchains: Incompatible with this kernel
[EMAIL PROTECTED] etc]#

Note that node04 had the exact same output for iptables -L. Additionally, Iset hosts.allow to ALL:ALL on node02 and node04 then repeated the testwhere node02 was running in deaf debug mode and node04 was running in mutedebug mode. Node04 never received any data. It just kept calling thecleanup thread.

I'm stumped....I guess the parse buffer error above means that there's nobuffer to parse?


Thanks again,
-Phil

At 12:46 PM 12/19/2002, you wrote:

Phil Forrest wrote:
Hello All,
Once upon a time, I had a happy ganglia monitor that was giving mevaluable data on all nodes of my 48 node cluster. Then I got a requestfrom a user to upgrade the kernel. After I upgraded the kernels acrossthe cluster, my ganglia could only see the data from the gmond running onthe head node (which also had gmetad and httpd running).The cluster is running Red Hat 7.3 with kernel 2.4.9-34smp #1 SMP Sat Jun1 05:54:57 EDT 2002 i686 unknownMy cluster has 46 compute nodes with one (eth0) interface and two headnodes with two interfaces (eth0 and eth1) one for the private lan and onefor the campus network. My head node that has gmetad running has"mcast_if eth1" set in its gmond.conf file. Here's the /sbin/ifconfigslice for eth1 on the head node:
eth1      Link encap:Ethernet  HWaddr 00:40:F4:2A:6E:26
          inet addr:192.168.5.200  Bcast:192.168.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:176581970 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160905314 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0
          RX bytes:1187468116 (1132.4 Mb)  TX bytes:2350492219 (2241.6 Mb)
Can I trust the output of /sbin/ifconfig (meaning, if /sbin/ifconfig saysMULTICAST is running, is that the REAL truth, or can the kernel stillsuppress multicast transmissions??)
The kernel's firewalling configuration can still filter out multicasttraffic. Check your firewall config (man iptables :) ). If your configis very restrictive, at least poke a li'l hole for the multicast IP/port combo.
IIRC, the default iptables behavior changed a few point releases back inRedhat - it's now on. This is apparently to help everyone who'sinstalling it on their desktop connected to the net via cable modem fromgetting owned...
Also, gmetad cares not one whit about /etc/gmond.conf. I just did aonce-over on the code to make absolutely sure, there's no mention of it.It's /etc/gmetad.conf that you should concern yourself with on the headunits if you're having display problems. Unless they're also supposed tobe part of the cluster, in which case you would configure the gmondsseparately.
Remember to open firewall ports for TCP port 8649 on hosts running themonitoring core and TCP port 8651 for the hosts running gmetad.
The metadaemon should be determining the path to establish its connectionsvia the good ol' fashioned kernel routing table, just like anything else.
As a test, I've been running gmond on one node in deaf debug mode, and onanother node in mute debug mode. The deaf one is pumping out datasuccessfully and the mute one is not seeing anything. Since this iscompute node to compute node, there can only be one interface (eth0).There has to be something in the kernel config that is screwing this up.
That sounds like it's a firewall config issue or a router/switch configissue to me...
I'm wondering with all the kernel upgrades going on out there, maybesomeone has had similar issues? Thanks in advance for any info!
7.2 / 2.4.19smp on most of our nodes here, no reported problems with themonitoring core on any of them.
Happy Holidays To All,
-Phil Forrest
Yeah, happy Life Day, kids. ;)

Hope this info proves useful...


Phil Forrest
334-844-6910
Auburn University Dept. of Physics
Network & Scientific Computing
207 Leach Science Center

Re: [Ganglia-general] Kernel Upgrade killed my happy Ganglia

Reply via email to