I'm guessing that your new Ganglia cluster and your old Ganglia cluster are sending metrics out on the same multicast address.

The fix is easy in one sense, but difficult in another (depending on the nature and quality of your cluster management tools):

Change the multicast IP or port on one of the clusters (in /etc/gmond.conf) and restart the monitoring core on those machines.

If it's more practical to do so (and you feel like making your network equipment do the work), you could set a rule in some intervening network device between the two clusters not to forward packets with the Ganglia destination IP between the two clusters...

Not a "better" fix, probably, but it'd tell you if you were on the right track...

Steve Gilbert wrote:
Hi everyone,
        I inherited a large Ganglia 2.0 installation which I am currently
trying to upgrade to version 2.5.4.  I decided to first roll this out to a
new ~200 node Linux cluster which has never had Ganglia in hopes of
familiarizing myself with this tool.  I installed the gmond RPM on all the
nodes.  I didn't touch the /etc/gmond.conf file at all.
        Once this was done, I telnet to port 8649 on the localhost and
received the XML dump that I expected...all the hosts were at least listed
in there.  However, when I run gstat, it sees all the nodes as being dead,
and my log files are filling up very fast with stuff like this:

Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:08 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:10 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
error: STRANGE type!
Aug 26 15:32:14 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
xdr_string() error: Interrupted system call
Aug 26 15:32:15 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
error: STRANGE type!


...any ideas what I'm doing wrong?  I'm not very familiar at all with
multicast.  Thanks a lot for any help.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general



Reply via email to