Dan,
could you post the relevant snippets from gmond.conf from your cluster
nodes ?
What is the XML output from gmond on the POL cluster ?
Thanx,
Matthias
On Wed, 31 Oct 2007, Dan Bretherton wrote:
> Dear All,
>
> Here are some updates to the message I posted to the list yesterday:
>
> 1) The XML data from gmetad seems to be correct. I got this data from
> "telnet
> localhost 8651". I can't see any incorrect nodes listed under POL, so I now
> suspect the Web frontend rather than gmetad.
>
> 2) the data in /var/lib/ganglia/rrds seems to be correct. There are no
> incorrect nodes listed in the directory for the POL cluster. This also
> points the finger at the Web frontend.
>
> 3) I have tried out the latest versions of gmetad and the Web frontend
> (3.0.5)
> with the latest version of rrdtool (1.2.23) on another computer to make sure
> the problem is not being caused by a bug that has been fixed. I found that
> the same problem occurs with the latest versions so I have left the public
> server (http://www.resc.reading.ac.uk/ganglia/) on ganglia version 3.0.3 and
> rrdtool version 1.2.15
>
> 4) The BAS cluster nodes actually have different IP addresses to the POL
> nodes
> of the same name, so the IP addresses are not the cause of the BAS nodes
> being listed in the POL cluster report.
>
> Regards,
> -Dan.
>
> On Tuesday 30 Oct 2007, you wrote:
> > Dear All,
> >
> > This is the first time I have posted to the list, but I have made good use
> > of the archives on many occasions. Unfortunately I can't find anything in
> > the archives to help with my current problem.
> >
> > I am monitoring a grid consisting of clusters at three institutions called
> > POL, BAS and ESSC. The clusters are all from the same supplier and use the
> > same convention for slave node IP addresses and host names. All the
> > clusters are behind their own institutional firewalls. My Ganglia Web
> > frontend is at the following address:
> > http://www.resc.reading.ac.uk/ganglia/
> >
> > My problem is that the POL cluster report mixes up nodes from all three
> > clusters. The POL cluster is listed as "NEMO cluster @ POL" on the grid
> > report page of my Web frontend. There are three main problems with the POL
> > cluster report:
> > 1) Nodes at ESSC and BAS with names not found at POL usually show up as
> > blank spaces on the POL cluster page unless they are down, in which case
> > they are represented by the usual pink box
> > 2) The load level colouring (and hence the positioning on the page) of
> > nodes that have the same name as nodes in other clusters is often governed
> > by the other clusters
> > 3) The overview section of the POL cluster report has incorrect values for
> > load percentages and number of CPUs etc.
> >
> > Here is an excerpt from my gmetad.conf file showing the three data sources.
> > The host names have been changed for security reasons.
> >
> > data_source "POL's gmond" 65 pol.host.name:8649
> > data_source "ESSC's gmond" 60 essc.host.name:8649
> > data_source "BAS's gmond through SSH tunnel" 70 localhost:8647
> >
> > Here is some more information I think may be relevant.
> > -- The ESSC cluster is on the same subnet as my Web frontend server
> > -- There are no problems with the ESSC and BAS cluster reports
> > -- The XML data received from POL's gmond is correct
> > -- My gmetad version is 3.0.3, but I get the same problem on my backup
> > gmetad machine which still has version 2.5.7
> > -- POL's gmond is version 3.0.3, but ESSC and BAS have gmond version 2.5.7
> > -- Accessing POL's gmond through a different port via an SSH tunnel (i.e.
> > localhost:8648 instead of pol.host.name:8649) makes no difference
> > -- Changing the order of the data sources in gmetad.conf makes no
> > difference -- Removing either the ESSC or the BAS data source makes no
> > difference; the POL cluster report still gets mixed up with the other
> > cluster, which ever one it is
> > -- Deleting all the RRD files in /var/lib/ganglia/rrds/ and starting again
> > makes no difference
> > -- The grid report page has correct values for the POL cluster
> >
> > I could change the host names and IP addresses of the ESSC cluster nodes,
> > but that wouldn't stop the POL cluster report getting confused with BAS
> > nodes and changing those clusters is not an option. Is there any way to
> > solve this problem without making the node names of all the clusters
> > different? All suggestions would be gratefully received. I hope I haven't
> > missed something obvious.
> >
> > -Dan Bretherton.
>
> --
> Mr. D.A. Bretherton
> Environmental Systems Science Centre
> Harry Pitt Building
> 3 Earley Gate
> Reading University
> Reading, RG6 6AL
> UK
>
> Tel. +44 118 378 7722
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general