Hello Jared,
yes, most likely this is because of multicasting.
Unless you really want to use multiple gmond as collectors, it's simpler and
more robust to use unicast to the gmond on the host which runs gmetad.
Otherwise, to debug multicast the first thing would be to tcpdump on the host
running gmetad, to see that you are actually receiving multicast there.
Cheers,
Sergio
On 19 Feb 2015, at 04:16, Jared David Baker <jared.ba...@uwyo.edu> wrote:
> I posted a while back, left for a bit and am now coming back. I’m attempting
> to get Ganglia working on a cluster and have followed the instructions fairly
> closely (no radical changes). The issue that I’m having is that nothing is
> aggregating to gmetad. There are times when we see compute nodes for ~2
> minutes, then they disappear and doesn’t come back until I restart gmond
> (then another 2 minute cycle, etc.). I think it may perhaps be related to
> multicasting, but not quite sure. Here is some basic output:
>
> [root@l1 ~]# netstat -rn
> Kernel IP routing table
> Destination Gateway Genmask Flags MSS Window irtt Iface
> 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
> 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
> 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
> 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib0
> 224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 eth0
> 0.0.0.0 192.168.1.250 0.0.0.0 UG 0 0 0 eth0
>
> [root@l1 ~]# netstat -gn
> IPv6/IPv4 Group Memberships
> Interface RefCnt Group
> --------------- ------ ---------------------
> lo 1 224.0.0.1
> eth0 1 239.2.11.71
> eth0 1 224.0.0.1
> ib0 1 224.0.0.1
> lo 1 ff02::1
> eth0 1 ff02::202
> eth0 1 ff02::1:ff11:fb15
> eth0 1 ff02::1
> eth1 1 ff02::1
> ib0 1 ff02::1:ff3d:1
> ib0 1 ff02::1
>
> [root@l1 ~]# ip link show eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
> qlen 1000
> link/ether 40:16:7e:11:fb:15 brd ff:ff:ff:ff:ff:ff
>
> So I can see the multicast route is there, multicast is enabled on the
> interface. If I telnet to the client host from the gmetad server, I get the
> XML data. The switches in the cluster are configured to support multicast (as
> I’m told by our networking team). However, the web server still claims that
> all nodes are down, even though I can see gmond clearly running by checking
> the process and querying the node via telnet. I haven’t seen anything
> relevant in the log file or when debugging gmetad in the foreground. Any help
> would be greatly appreciated.
>
> Thanks!
> --
> Jared
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general