I figured it out.  In the gmond.conf file there is a default to let the
program decide which interface to use based upon the kernel routing table.
For some reason, these nodes were using my Gigabit adapter instead of my
10/100.  I changed the gmond.conf file to use the 10/100 interface and they
all appeared.. Like magic!

Thanks for responding to my question.

Alex Weeks
Linux Systems Analyst




                                                                                
                                                                     
                      Steve Gilbert                                             
                                                                     
                      <[EMAIL PROTECTED]        To:       Alexander 
Weeks/Poughkeepsie/Contr/[EMAIL PROTECTED], 
[email protected]         
                      com>                     cc:                              
                                                                     
                                               Subject:  RE: [Ganglia-general] 
Missing nodes                                                         
                      10/16/2003 12:36                                          
                                                                     
                      PM                                                        
                                                                     
                                                                                
                                                                     
                                                                                
                                                                     




Some things to try...this probably won't solve anything, but you can report
back here on what you find, and you can get better help.

1. Log into one of the 8 machines that doesn't show up and run 'gstat
--all'.  What does it tell you?  Do the 8 machines see each other?  Do they
report the rest of the 128 as being down?

2. What do you get when you telnet to port 8649?  Which machines are
reported there?
Do this from the 8 bad machines as well as the ones that are working.

I would say this is almost definitely a multicast issue of some kind.  Are
all these machines on the same subnet?  What network devices (switches,
routers, whatever) are they using?  Are the 8 bad machines on a different
switch?

Let us know what you find.  I had a very similar problem recently, and it
turned out to be something with our switches and the way they handle
multicast traffic.  I know practically nothing about multicast, so I don't
know the details, but I think our network guy finally did something where
multicast traffic is more or less treated the same as broadcast packets or
something like that.

Steve Gilbert
Unix Systems Administrator
[EMAIL PROTECTED]


-----Original Message-----
From: Alexander Weeks [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 16, 2003 6:22 AM
To: [email protected]
Subject: [Ganglia-general] Missing nodes






I am trying to use Ganglia on a 128 node cluster.  I have 8 nodes that just
won't show up.  I have confirmed that gmond is running on them, and can
telnet into port 8649.  What can I do to debug this?  What can I check for?

Alex Weeks
Linux Systems Analyst



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general




Reply via email to