John Francis Lee wrote:
Thanks again!
Setting the debug level to 10 showed me that gmetad was unable to
connect to itself! I changed the datasource specification to 'localhost'
from the machine'd fqdn and things worked!
What I get now is
'There are 10 nodes up and running. There are no nodes down.'
But when I click on the pictures of any of the 10 machines I get:
'This node is down'.
I'm still investigating why this is so.
gmetad's connecting to itself? Oh my. That should never happen.
Also, data_source is on a per-cluster basis.
In other words:
data_source "one_group_of_servers" server1.farm.net server2.farm.net [...]
There may be a display bug in the released version of the web front-end
that counts down hosts as up in the meta and cluster views (the host view
is correct). But that doesn't explain why the hosts appear as down in the
first place.
What's especially creepy is that the hosts are ALL marked as down. If
there were a network problem you would expect n-1 hosts to be down, and 1
host to be up - the one running the gmond that gmetad queried to get the
data in the first place.
In fact, there might be another bug in one of the webfrontend tarballs
floating around out there. I think I might have seen this when I upgraded
the web-frontend to 2.5.1 - the code starts using TN/TMAX to determine
whether a host is up, but there's a logic error in it. Pretty sure it's
fixed in CVS.
Try the CVS version of ganglia-webfrontend and see if that fixes it.
[maybe it's time to release 2.5.2?]
(I draw the line at a CVS howto, folks...)