Sergio,
Thanks for the suggestion. I wrangled one of our network administrators and did
some further debugging. However, before that, I went and physically removed the
network switch (Juniper EX series) to see if it was an OS problem by just
connecting two machines via cat6. Turns out, once I removed the switch and
using `omping`, multicast was able to stay alive. After that, I went and talked
with the network admin more and he looked into the switch configuration.
Apparently multicast was on to be routed between different VLANs and thus after
so long, would dump the packets for some reason. Turning multicast off and
letting the switch assume it is just broadcast traffic completely fixed the
issue. Perhaps someone will find this useful down the road.
Regards,
Jared
From: Sergio Ballestrero [mailto:sergio.ballestr...@gmail.com]
Sent: Thursday, February 19, 2015 12:58 AM
To: Jared David Baker
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia 3.6.1 and CentOS 6.5
Hello Jared,
yes, most likely this is because of multicasting.
Unless you really want to use multiple gmond as collectors, it's simpler and
more robust to use unicast to the gmond on the host which runs gmetad.
Otherwise, to debug multicast the first thing would be to tcpdump on the host
running gmetad, to see that you are actually receiving multicast there.
Cheers,
Sergio
On 19 Feb 2015, at 04:16, Jared David Baker
<jared.ba...@uwyo.edu<mailto:jared.ba...@uwyo.edu>> wrote:
I posted a while back, left for a bit and am now coming back. I'm attempting to
get Ganglia working on a cluster and have followed the instructions fairly
closely (no radical changes). The issue that I'm having is that nothing is
aggregating to gmetad. There are times when we see compute nodes for ~2
minutes, then they disappear and doesn't come back until I restart gmond (then
another 2 minute cycle, etc.). I think it may perhaps be related to
multicasting, but not quite sure. Here is some basic output:
[root@l1 ~]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib0
224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 eth0
0.0.0.0 192.168.1.250 0.0.0.0 UG 0 0 0 eth0
[root@l1 ~]# netstat -gn
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 224.0.0.1
eth0 1 239.2.11.71
eth0 1 224.0.0.1
ib0 1 224.0.0.1
lo 1 ff02::1
eth0 1 ff02::202
eth0 1 ff02::1:ff11:fb15
eth0 1 ff02::1
eth1 1 ff02::1
ib0 1 ff02::1:ff3d:1
ib0 1 ff02::1
[root@l1 ~]# ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
qlen 1000
link/ether 40:16:7e:11:fb:15 brd ff:ff:ff:ff:ff:ff
So I can see the multicast route is there, multicast is enabled on the
interface. If I telnet to the client host from the gmetad server, I get the XML
data. The switches in the cluster are configured to support multicast (as I'm
told by our networking team). However, the web server still claims that all
nodes are down, even though I can see gmond clearly running by checking the
process and querying the node via telnet. I haven't seen anything relevant in
the log file or when debugging gmetad in the foreground. Any help would be
greatly appreciated.
Thanks!
--
Jared
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/ganglia-general
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general