[Ganglia-general] gmetad mixing up nodes from different clusters

Dan Bretherton Tue, 30 Oct 2007 11:59:10 -0800

Dear All,

This is the first time I have posted to the list, but I have made good use of 
the archives on many occasions.  Unfortunately I can't find anything in the 
archives to help with my current problem.


I am monitoring a grid consisting of clusters at three institutions called 
POL, BAS and ESSC.  The clusters are all from the same supplier and use the 
same convention for slave node IP addresses and host names.  All the clusters 
are behind their own institutional firewalls.  My Ganglia Web frontend is at 
the following address:
http://www.resc.reading.ac.uk/ganglia/

My problem is that the POL cluster report mixes up nodes from all three 
clusters.  The POL cluster is listed as "NEMO cluster @ POL" on the grid 
report page of my Web frontend. There are three main problems with the POL 
cluster report:
1)  Nodes at ESSC and BAS with names not found at POL usually show up as blank 
spaces on the POL cluster page unless they are down, in which case they are 
represented by the usual pink box
2) The load level colouring (and hence the positioning on the page) of nodes 
that have the same name as nodes in other clusters is often governed by the 
other clusters
3) The overview section of the POL cluster report has incorrect values for 
load percentages and number of CPUs etc.

Here is an excerpt from my gmetad.conf file showing the three data sources.  
The host names have been changed for security reasons.

data_source "POL's gmond" 65 pol.host.name:8649
data_source "ESSC's gmond" 60 essc.host.name:8649
data_source "BAS's gmond through SSH tunnel" 70 localhost:8647

Here is some more information I think may be relevant.
-- The ESSC cluster is on the same subnet as my Web frontend server
-- There are no problems with the ESSC and BAS cluster reports
-- The XML data received from POL's gmond is correct
-- My gmetad version is 3.0.3, but I get the same problem on my backup gmetad 
machine which still has version 2.5.7
-- POL's gmond is version 3.0.3, but ESSC and BAS have gmond version 2.5.7
-- Accessing POL's gmond through a different port via an SSH tunnel (i.e. 
localhost:8648 instead of pol.host.name:8649) makes no difference
-- Changing the order of the data sources in gmetad.conf makes no difference
-- Removing either the ESSC or the BAS data source makes no difference; the 
POL cluster report still gets mixed up with the other cluster, which ever one 
it is
-- Deleting all the RRD files in /var/lib/ganglia/rrds/ and starting again 
makes no difference
-- The grid report page has correct values for the POL cluster

I could change the host names and IP addresses of the ESSC cluster nodes, but 
that wouldn't stop the POL cluster report getting confused with BAS nodes and 
changing those clusters is not an option.  Is there any way to solve this 
problem without making the node names of all the clusters different?  All 
suggestions would be gratefully received.  I hope I haven't missed something 
obvious.

-Dan Bretherton.

-- 
Mr. D.A. Bretherton
Environmental Systems Science Centre
Harry Pitt Building
3 Earley Gate
Reading University
Reading, RG6 6AL
UK

Tel. +44 118 378 7722

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] gmetad mixing up nodes from different clusters

Reply via email to