Vu, Phuong A (MP Technology) wrote:
This is probably very obvious, and I have read some of the previous discussions on this topics but the answer given does not seem to work for me
at all, and the documentation is not obvious for me how to proceed further.

Scenario:  I have say 32 Linux boxes on the same core switch, no routing
involved,
but would like to partition them out into say 2 clusters of 16 machines each
and running gmetad on another Linux box, say XYZ.  I gave each cluster a
name as defined
in the /etc/gmond.conf file on the individual machine, and use as
data_source
in the /etc/gmetad.conf file on XYZ

data_source "cluster1" ... here I list the mane of the 16 machines on the
1st cluster
data_source "cluster2" ... here I list the name of the 16 machines on the
2nd cluster

When all else fails, go to the source!

When you telnet to localhost:8651 on XYZ (the host running the web front-end and the metadaemon), you should see two open <CLUSTER> tags (grep for it!) - one for cluster1, one for cluster2.

If you see one, then gmetad is having trouble reaching the hosts in one of the two data sources. If you see none, it's having trouble reaching both of them. If you see "Connection refused," gmetad is not running or it's crashing on startup.

Even if your RRD permissions are all jacked up on the front-end, you should be able to get a basic web page up.

Make sure that conf.php in the web front-end is connecting to  localhost:8651.

Also, make sure that your monitoring cores trust the box running the front-end. Otherwise, they will not be able to connect.

You can verify that they trust the box by telnetting to $MEMBER_OF_CLUSTER port 8649. If you see the connection accepted and then closed immediately, you have a trust issue (add your front-end box's IP to /etc/gmond.conf on the cluster machines you intend to poll). If the connection is refused, you may need to check the monitoring core's configuration (is it listening on that port? Configured as mute?). If XML data goes shooting across your xterm, you win a prize - that could mean you have a weird gmetad XML parsing error on your hands (with custom metrics, this can happen... ). Your prize is, you get to run gmetad with debug output on! :)

1. Unless I also start gmond on XYZ, I won't be able to see anything on the
webfrontend.
   I don't want XYZ to be part of the monitored clusters.

The monitoring core probably shouldn't be running on the front-end. The metadaemon should be enough.

I know in previous reply, Steven Wagner has said that this should work, but
I am not able to get it to behave that way.  Am I missing something very
obvious ?

You know, gentle readers, you really shouldn't assume that because my setup works, anyone else's will. The stuff I did to get my Ganglia setup working went way beyond messing with the configs. Especially when it comes to the metadaemon...

By this point in Ganglia 2.x's development, if someone is going through the kind of stuff I went through on any of the currently-supported platforms, something is wrong...


Reply via email to