steve-

steve is right on track with the suggestions.  it appears that you are 
running two different versions of ganglia 2.x at the same time.  a quick 
way to get the version of gmond is to run...

% /usr/sbin/gmond --version

(or wherever you installed gmond on each machine).  so running that 
command on your entire cluster will let you know which machine is causing 
problems. 

multicast is a great way to do group messaging but it is also a victim of
its own success.  if a single machine in the cluster (aka multicast
channel) is not upgraded.. it will be sending old format messages to every 
host in the cluster (and they will complain as you are noticing now).

if you can't upgrade gmond on some of the machines, you can do like steve 
suggested and place them in a separate multicast group.

the multicast group is changed in /etc/gmond.conf...

mcast_group 239.2.11.78

for example.. and the multicast port is changed using

mcast_port  8650

the default multicast group is 239.2.11.71 and the default port is 8649.

good luck
-matt






Today, Steve Gilbert wrote forth saying...

> From: Steve Gilbert <[EMAIL PROTECTED]>
> To: "'[email protected]'"
>     <[email protected]>
> Date: Tue, 26 Aug 2003 15:35:24 -0700
> Subject: [Ganglia-general] New user gmond woes
> 
> Hi everyone,
>       I inherited a large Ganglia 2.0 installation which I am currently
> trying to upgrade to version 2.5.4.  I decided to first roll this out to a
> new ~200 node Linux cluster which has never had Ganglia in hopes of
> familiarizing myself with this tool.  I installed the gmond RPM on all the
> nodes.  I didn't touch the /etc/gmond.conf file at all.
>       Once this was done, I telnet to port 8649 on the localhost and
> received the XML dump that I expected...all the hosts were at least listed
> in there.  However, when I run gstat, it sees all the nodes as being dead,
> and my log files are filling up very fast with stuff like this:
> 
> Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:07 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:08 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:09 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:10 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:11 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:13 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> error: STRANGE type!
> Aug 26 15:32:14 l-sim-205-145 /usr/sbin/gmond[683]: mcast_listen_thread()
> xdr_string() error: Interrupted system call
> Aug 26 15:32:15 l-sim-205-145 /usr/sbin/gmond[685]: mcast_listen_thread()
> error: STRANGE type!
> 
> 
> ...any ideas what I'm doing wrong?  I'm not very familiar at all with
> multicast.  Thanks a lot for any help.
> 
> Steve Gilbert
> Unix Systems Administrator
> [EMAIL PROTECTED]
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 


Reply via email to