"Can you be more specific about "for some reason"?" - Certainly: if the server is hard-down/not-pingable. At the moment this is the case with "app001" (planned maintenance). So I have taken this opportunity to see if the data_source(s) were set correctly. "Also, what do you see in the syslog on the host running gmetad?" - Nothing. At least nothing related to Ganglia. Running gmetad with debug=9 reveals that the servers in this (App cluster) cluster are not reporting/updating like in other clusters. Watching the debug output, I see updates coming from non-app nodes (currently all sources are pingable) and the UI displaying the non-app nodes/cluster just fine. But the cluster/nodes "App cluster" is/are actually omitted from the UI altogether.
"There's a known problem if the data_source node *does* allow a TCP connection to be successfully made, but gmond does not respond with the data, or times out." - hm, should not be the issue here since the source is not responding to any requests as it is hard-down. But this is good to know because I was going to test fault-tolerance by stopping the gmond on another host(s). Would it help to have a snapshot of my environment? Nothing exotic: SuSe 10 Enterprise, x86_64 (all clients and gmetad servers) 32GB mem ganglia 3.0.3 apache2-2.2.3-16.9 php5-5.1.2-29.35 apache2-mod_php5-5.1.2-29.35 I will continue to do some digging and forward my findings. Thanks for the help and the quick response! -Jesse ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

