Vu, Phuong A (MP Technology) wrote:
This is probably very obvious, and I have read some of the previous
discussions on this topics but the answer given does not seem to work for me
at all, and the documentation is not obvious for me how to proceed further.
Scenario: I have say 32 Linux boxes on the same core switch, no routing
involved,
but would like to partition them out into say 2 clusters of 16 machines each
and running gmetad on another Linux box, say XYZ. I gave each cluster a
name as defined
in the /etc/gmond.conf file on the individual machine, and use as
data_source
in the /etc/gmetad.conf file on XYZ
data_source "cluster1" ... here I list the mane of the 16 machines on the
1st cluster
data_source "cluster2" ... here I list the name of the 16 machines on the
2nd cluster
When all else fails, go to the source!
When you telnet to localhost:8651 on XYZ (the host running the web
front-end and the metadaemon), you should see two open <CLUSTER> tags (grep
for it!) - one for cluster1, one for cluster2.
If you see one, then gmetad is having trouble reaching the hosts in one of
the two data sources. If you see none, it's having trouble reaching both
of them. If you see "Connection refused," gmetad is not running or it's
crashing on startup.
Even if your RRD permissions are all jacked up on the front-end, you should
be able to get a basic web page up.
Make sure that conf.php in the web front-end is connecting to localhost:8651.
Also, make sure that your monitoring cores trust the box running the
front-end. Otherwise, they will not be able to connect.
You can verify that they trust the box by telnetting to $MEMBER_OF_CLUSTER
port 8649. If you see the connection accepted and then closed immediately,
you have a trust issue (add your front-end box's IP to /etc/gmond.conf on
the cluster machines you intend to poll). If the connection is refused,
you may need to check the monitoring core's configuration (is it listening
on that port? Configured as mute?). If XML data goes shooting across your
xterm, you win a prize - that could mean you have a weird gmetad XML
parsing error on your hands (with custom metrics, this can happen... ).
Your prize is, you get to run gmetad with debug output on! :)
1. Unless I also start gmond on XYZ, I won't be able to see anything on the
webfrontend.
I don't want XYZ to be part of the monitored clusters.
The monitoring core probably shouldn't be running on the front-end. The
metadaemon should be enough.
I know in previous reply, Steven Wagner has said that this should work, but
I am not able to get it to behave that way. Am I missing something very
obvious ?
You know, gentle readers, you really shouldn't assume that because my setup
works, anyone else's will. The stuff I did to get my Ganglia setup working
went way beyond messing with the configs. Especially when it comes to the
metadaemon...
By this point in Ganglia 2.x's development, if someone is going through the
kind of stuff I went through on any of the currently-supported platforms,
something is wrong...