The exact same thing? (hosts report as up until you click on them, then
all hosts report as down, gmetad is reporting live data for all hosts in
the cluster)
Then those bugs are definitely still floating around. Definitely sounds
like a 2.5.2 (2.5.1a?) release would be good right now, at least for the
webfrontend module. Someone should do something. :P
Try upgrading the webfrontend to the current CVS version...
Logan Donaldson wrote:
I observed the exact same thing with my setup. Then I changed the
gmetad.conf file or restarted it and then .... nothing. Back to one host
visible again. "gstat" sez the same.
tcpdump on the server says that packs are flying from the clients over
the multicast channel. But they are all still empty, just the XML with
no metric data.
Again, if anyone is willing to send me the particulars of their working
installation, I'd really appreciate it .
thanks
logan donaldson
[EMAIL PROTECTED]
On Tuesday, January 28, 2003, at 01:02 PM, Steven Wagner wrote:
John Francis Lee wrote:
Thanks again!
Setting the debug level to 10 showed me that gmetad was unable to
connect to itself! I changed the datasource specification to 'localhost'
from the machine'd fqdn and things worked!
What I get now is
'There are 10 nodes up and running. There are no nodes down.'
But when I click on the pictures of any of the 10 machines I get:
'This node is down'.
I'm still investigating why this is so.
gmetad's connecting to itself? Oh my. That should never happen.
Also, data_source is on a per-cluster basis.
In other words:
data_source "one_group_of_servers" server1.farm.net server2.farm.net
[...]
There may be a display bug in the released version of the web
front-end that counts down hosts as up in the meta and cluster views
(the host view is correct). But that doesn't explain why the hosts
appear as down in the first place.
What's especially creepy is that the hosts are ALL marked as down. If
there were a network problem you would expect n-1 hosts to be down,
and 1 host to be up - the one running the gmond that gmetad queried to
get the data in the first place.
In fact, there might be another bug in one of the webfrontend tarballs
floating around out there. I think I might have seen this when I
upgraded the web-frontend to 2.5.1 - the code starts using TN/TMAX to
determine whether a host is up, but there's a logic error in it.
Pretty sure it's fixed in CVS.
Try the CVS version of ganglia-webfrontend and see if that fixes it.
[maybe it's time to release 2.5.2?]
(I draw the line at a CVS howto, folks...)
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general