Dear All, Here are some updates to the message I posted to the list yesterday:
1) The XML data from gmetad seems to be correct. I got this data from "telnet localhost 8651". I can't see any incorrect nodes listed under POL, so I now suspect the Web frontend rather than gmetad. 2) the data in /var/lib/ganglia/rrds seems to be correct. There are no incorrect nodes listed in the directory for the POL cluster. This also points the finger at the Web frontend. 3) I have tried out the latest versions of gmetad and the Web frontend (3.0.5) with the latest version of rrdtool (1.2.23) on another computer to make sure the problem is not being caused by a bug that has been fixed. I found that the same problem occurs with the latest versions so I have left the public server (http://www.resc.reading.ac.uk/ganglia/) on ganglia version 3.0.3 and rrdtool version 1.2.15 4) The BAS cluster nodes actually have different IP addresses to the POL nodes of the same name, so the IP addresses are not the cause of the BAS nodes being listed in the POL cluster report. Regards, -Dan. On Tuesday 30 Oct 2007, you wrote: > Dear All, > > This is the first time I have posted to the list, but I have made good use > of the archives on many occasions. Unfortunately I can't find anything in > the archives to help with my current problem. > > I am monitoring a grid consisting of clusters at three institutions called > POL, BAS and ESSC. The clusters are all from the same supplier and use the > same convention for slave node IP addresses and host names. All the > clusters are behind their own institutional firewalls. My Ganglia Web > frontend is at the following address: > http://www.resc.reading.ac.uk/ganglia/ > > My problem is that the POL cluster report mixes up nodes from all three > clusters. The POL cluster is listed as "NEMO cluster @ POL" on the grid > report page of my Web frontend. There are three main problems with the POL > cluster report: > 1) Nodes at ESSC and BAS with names not found at POL usually show up as > blank spaces on the POL cluster page unless they are down, in which case > they are represented by the usual pink box > 2) The load level colouring (and hence the positioning on the page) of > nodes that have the same name as nodes in other clusters is often governed > by the other clusters > 3) The overview section of the POL cluster report has incorrect values for > load percentages and number of CPUs etc. > > Here is an excerpt from my gmetad.conf file showing the three data sources. > The host names have been changed for security reasons. > > data_source "POL's gmond" 65 pol.host.name:8649 > data_source "ESSC's gmond" 60 essc.host.name:8649 > data_source "BAS's gmond through SSH tunnel" 70 localhost:8647 > > Here is some more information I think may be relevant. > -- The ESSC cluster is on the same subnet as my Web frontend server > -- There are no problems with the ESSC and BAS cluster reports > -- The XML data received from POL's gmond is correct > -- My gmetad version is 3.0.3, but I get the same problem on my backup > gmetad machine which still has version 2.5.7 > -- POL's gmond is version 3.0.3, but ESSC and BAS have gmond version 2.5.7 > -- Accessing POL's gmond through a different port via an SSH tunnel (i.e. > localhost:8648 instead of pol.host.name:8649) makes no difference > -- Changing the order of the data sources in gmetad.conf makes no > difference -- Removing either the ESSC or the BAS data source makes no > difference; the POL cluster report still gets mixed up with the other > cluster, which ever one it is > -- Deleting all the RRD files in /var/lib/ganglia/rrds/ and starting again > makes no difference > -- The grid report page has correct values for the POL cluster > > I could change the host names and IP addresses of the ESSC cluster nodes, > but that wouldn't stop the POL cluster report getting confused with BAS > nodes and changing those clusters is not an option. Is there any way to > solve this problem without making the node names of all the clusters > different? All suggestions would be gratefully received. I hope I haven't > missed something obvious. > > -Dan Bretherton. -- Mr. D.A. Bretherton Environmental Systems Science Centre Harry Pitt Building 3 Earley Gate Reading University Reading, RG6 6AL UK Tel. +44 118 378 7722 ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

