I'm trying to revamp an existing Ganglia setup, hopefully being more organized 
and open for future expansion about it, but am running into an issue.

Here's the setup (all nodes running 3.0.7):

   "webby" is the machine with a gmetad and the web frontend.

   "headnode1" is the headnode of cluster #1, also with gmetad.

   "n01" ... "n03" are nodes in cluster #1, on a private network, each with 
gmond.

(Looking ahead, I have another cluster I'd like to add to this grid, and 
possibly more.)

I set up the "gmond.conf" files on the nodes by creating a default conf file 
and then editing it, setting the cluster name.

Started 'gmond' on the three nodes, and 'telnet localhost 8649' produces the 
XML that I expect to see: stats from all three nodes, wrapped in a <CLUSTER> 
tag with the name I set.

So far, so good.

"headnode1" gets "gmetad" with a "data_source" line like this:

   data_source "grAPHics" n01.sub.domain.edu

and one more modification from the default, adding the FQDN of "webby" in the 
"trusted_hosts" section:

   trusted_hosts webby.domain.edu

If I type "telnet n01 8649" from the "headnode1" command line I see the XML 
stream I expect to see, and saw when I typed the same command on "n01."

I'm also seeing the collected stats in /var/lib/ganglia/rrds.

Issue #1: running "gmetad -d10" on "headnode1" gives me this message:

Going to run as user ganglia
Sources are ...
Source: [grAPHics, step 15] has 1 sources
        10.0.25.1
tcp_listen() on xml_port failed: Address already in use

On to "webby," which I'm *hoping* will serve up the graphed statistics for not 
only this cluster but other clusters as well.

Its "gmetad.conf" file has a similar "data_source" line:

   data_source "grAPHics" headnode1.domain.edu

The web frontend "conf.php" has the "$ganglia_ip" and "$ganglia_port" values 
set, but that's definitely not right.

Here, we have bigger problems / issues:

   - running "gmetad -d10" yields the following:

     Going to run as user ganglia
     Sources are ...
     Source: [grAPHics, step 15] has 1 sources
            aaa.bbb.ccc.ddd <-- the correct IP address for "headnode1"
     tcp_listen() on xml_port failed: Address already in use

   - can't telnet to "headnode1" on 8649 / 8650 / 8651 to retrieve XML

   - /var/log/messages contains lots of these:

     <date> <time> webby /usr/sbin/gmetad[29411]: data_thread() got no answer 
from any [grAPHics] datasource

   - the web frontend, such as it is, displays no graphs.
     <http://fusion.cs.washington.edu/ganglia/?m=&r=hour&s=descending&hc=4>
     ('fusion.cs' is "webby")

So.

I've got questions:

   - do I need to run a webserver on 'headnode1' so that 'webby' can retrieve 
XML from it?
   - do I have the 'gmond.conf' files set up correctly on the nodes?
   - do I have the 'gmetad.conf' files set up correctlyon the head node and on 
the web frontend?
   - how do the 'gmetad' processes communicate with each other?

Can someone please help me?

Thank you.

-- 
Stephen Spencer | [email protected] | 206-616-3281
Graphics System Engineer, UW Computer Science Department
Chair, ACM SIGGRAPH Publications Committee

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to