Dear All, This is the first time I have posted to the list, but I have made good use of the archives on many occasions. Unfortunately I can't find anything in the archives to help with my current problem.
I am monitoring a grid consisting of clusters at three institutions called POL, BAS and ESSC. The clusters are all from the same supplier and use the same convention for slave node IP addresses and host names. All the clusters are behind their own institutional firewalls. My Ganglia Web frontend is at the following address: http://www.resc.reading.ac.uk/ganglia/ My problem is that the POL cluster report mixes up nodes from all three clusters. The POL cluster is listed as "NEMO cluster @ POL" on the grid report page of my Web frontend. There are three main problems with the POL cluster report: 1) Nodes at ESSC and BAS with names not found at POL usually show up as blank spaces on the POL cluster page unless they are down, in which case they are represented by the usual pink box 2) The load level colouring (and hence the positioning on the page) of nodes that have the same name as nodes in other clusters is often governed by the other clusters 3) The overview section of the POL cluster report has incorrect values for load percentages and number of CPUs etc. Here is an excerpt from my gmetad.conf file showing the three data sources. The host names have been changed for security reasons. data_source "POL's gmond" 65 pol.host.name:8649 data_source "ESSC's gmond" 60 essc.host.name:8649 data_source "BAS's gmond through SSH tunnel" 70 localhost:8647 Here is some more information I think may be relevant. -- The ESSC cluster is on the same subnet as my Web frontend server -- There are no problems with the ESSC and BAS cluster reports -- The XML data received from POL's gmond is correct -- My gmetad version is 3.0.3, but I get the same problem on my backup gmetad machine which still has version 2.5.7 -- POL's gmond is version 3.0.3, but ESSC and BAS have gmond version 2.5.7 -- Accessing POL's gmond through a different port via an SSH tunnel (i.e. localhost:8648 instead of pol.host.name:8649) makes no difference -- Changing the order of the data sources in gmetad.conf makes no difference -- Removing either the ESSC or the BAS data source makes no difference; the POL cluster report still gets mixed up with the other cluster, which ever one it is -- Deleting all the RRD files in /var/lib/ganglia/rrds/ and starting again makes no difference -- The grid report page has correct values for the POL cluster I could change the host names and IP addresses of the ESSC cluster nodes, but that wouldn't stop the POL cluster report getting confused with BAS nodes and changing those clusters is not an option. Is there any way to solve this problem without making the node names of all the clusters different? All suggestions would be gratefully received. I hope I haven't missed something obvious. -Dan Bretherton. -- Mr. D.A. Bretherton Environmental Systems Science Centre Harry Pitt Building 3 Earley Gate Reading University Reading, RG6 6AL UK Tel. +44 118 378 7722 ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

