"Can you be more specific about "for some reason"?" - Certainly: if the
server is hard-down/not-pingable. At the moment this is the case with
"app001" (planned maintenance). So I have taken this opportunity to see
if the data_source(s) were set correctly. 
 
"Also, what do you see in the syslog on the host running gmetad?" -
Nothing. At least nothing related to Ganglia. 
Running gmetad with debug=9 reveals that the servers in this (App
cluster) cluster are not reporting/updating like in other clusters.
Watching the debug output, I see updates coming from non-app nodes
(currently all sources are pingable) and the UI displaying the non-app
nodes/cluster just fine. But the cluster/nodes "App cluster" 
is/are actually omitted from the UI altogether.

"There's a known problem if the data_source node *does* allow a TCP
connection to be successfully made, but gmond does not respond with
the data, or times out."  - hm, should not be the issue here since the
source is not responding to any requests as it is hard-down. But this is
good to know because I was going to test fault-tolerance by stopping the
gmond on another host(s).

Would it help to have a snapshot of my environment? Nothing exotic: 
SuSe 10 Enterprise, x86_64 (all clients and gmetad servers)
32GB mem
ganglia 3.0.3
apache2-2.2.3-16.9
php5-5.1.2-29.35
apache2-mod_php5-5.1.2-29.35

I will continue to do some digging and forward my findings. Thanks for
the help and the quick response!

-Jesse

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to