Sergio,

Thanks for the suggestion. I wrangled one of our network administrators and did 
some further debugging. However, before that, I went and physically removed the 
network switch (Juniper EX series) to see if it was an OS problem by just 
connecting two machines via cat6. Turns out, once I removed the switch and 
using `omping`, multicast was able to stay alive. After that, I went and talked 
with the network admin more and he looked into the switch configuration. 
Apparently multicast was on to be routed between different VLANs and thus after 
so long, would dump the packets for some reason. Turning multicast off and 
letting the switch assume it is just broadcast traffic completely fixed the 
issue. Perhaps someone will find this useful down the road.

Regards,

Jared

From: Sergio Ballestrero [mailto:sergio.ballestr...@gmail.com]
Sent: Thursday, February 19, 2015 12:58 AM
To: Jared David Baker
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia 3.6.1 and CentOS 6.5

Hello Jared,
yes, most likely this is because of multicasting.
Unless you really want to use multiple gmond as collectors, it's simpler and 
more robust to use unicast to the gmond on the host which runs gmetad.
Otherwise, to debug multicast the first thing would be to tcpdump on the host 
running gmetad, to see that you are actually receiving multicast there.

Cheers,
 Sergio

On 19 Feb 2015, at 04:16, Jared David Baker 
<jared.ba...@uwyo.edu<mailto:jared.ba...@uwyo.edu>> wrote:


I posted a while back, left for a bit and am now coming back. I'm attempting to 
get Ganglia working on a cluster and have followed the instructions fairly 
closely (no radical changes). The issue that I'm having is that nothing is 
aggregating to gmetad. There are times when we see compute nodes for ~2 
minutes, then they disappear and doesn't come back until I restart gmond (then 
another 2 minute cycle, etc.). I think it may perhaps be related to 
multicasting, but not quite sure. Here is some basic output:

[root@l1 ~]# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.3.0     0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ib0
224.0.0.0       0.0.0.0         240.0.0.0       U         0 0          0 eth0
0.0.0.0         192.168.1.250   0.0.0.0         UG        0 0          0 eth0

[root@l1 ~]# netstat -gn
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      224.0.0.1
eth0            1      239.2.11.71
eth0            1      224.0.0.1
ib0             1      224.0.0.1
lo              1      ff02::1
eth0            1      ff02::202
eth0            1      ff02::1:ff11:fb15
eth0            1      ff02::1
eth1            1      ff02::1
ib0             1      ff02::1:ff3d:1
ib0             1      ff02::1

[root@l1 ~]# ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP 
qlen 1000
    link/ether 40:16:7e:11:fb:15 brd ff:ff:ff:ff:ff:ff

So I can see the multicast route is there, multicast is enabled on the 
interface. If I telnet to the client host from the gmetad server, I get the XML 
data. The switches in the cluster are configured to support multicast (as I'm 
told by our networking team). However, the web server still claims that all 
nodes are down, even though I can see gmond clearly running by checking the 
process and querying the node via telnet. I haven't seen anything relevant in 
the log file or when debugging gmetad in the foreground. Any help would be 
greatly appreciated.

Thanks!
--
Jared
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/ganglia-general

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to