Thanks again Steven for the pointer. I was able to get it to work by turning all_trusted on, or setting the trusted host to the ip of the webfrontend machine. I did not think to look there at the beginning since it seems to be able to connect when I use the default multicast IP address. Now I don't even have to run the gmond on the frontend which is what I expected. Kind of confusing what I had earlier, but it seems to work now. Very nice tool.
Btw, since documentation is so sparse, if you all need help writing some user documentation for the next release, just let me know. Thanks! Phuong -----Original Message----- From: Steven Wagner [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 15, 2003 7:43 PM To: Vu, Phuong A (MP Technology) Cc: '[email protected]' Subject: Re: [Ganglia-general] How to setup multiple clusters using different multicast IP Vu, Phuong A (MP Technology) wrote: > This is probably very obvious, and I have read some of the previous > discussions on this topics but the answer given does not seem to work for me > at all, and the documentation is not obvious for me how to proceed further. > > Scenario: I have say 32 Linux boxes on the same core switch, no routing > involved, > but would like to partition them out into say 2 clusters of 16 machines each > and running gmetad on another Linux box, say XYZ. I gave each cluster a > name as defined > in the /etc/gmond.conf file on the individual machine, and use as > data_source > in the /etc/gmetad.conf file on XYZ > > data_source "cluster1" ... here I list the mane of the 16 machines on the > 1st cluster > data_source "cluster2" ... here I list the name of the 16 machines on the > 2nd cluster When all else fails, go to the source! When you telnet to localhost:8651 on XYZ (the host running the web front-end and the metadaemon), you should see two open <CLUSTER> tags (grep for it!) - one for cluster1, one for cluster2. If you see one, then gmetad is having trouble reaching the hosts in one of the two data sources. If you see none, it's having trouble reaching both of them. If you see "Connection refused," gmetad is not running or it's crashing on startup. Even if your RRD permissions are all jacked up on the front-end, you should be able to get a basic web page up. Make sure that conf.php in the web front-end is connecting to localhost:8651. Also, make sure that your monitoring cores trust the box running the front-end. Otherwise, they will not be able to connect. You can verify that they trust the box by telnetting to $MEMBER_OF_CLUSTER port 8649. If you see the connection accepted and then closed immediately, you have a trust issue (add your front-end box's IP to /etc/gmond.conf on the cluster machines you intend to poll). If the connection is refused, you may need to check the monitoring core's configuration (is it listening on that port? Configured as mute?). If XML data goes shooting across your xterm, you win a prize - that could mean you have a weird gmetad XML parsing error on your hands (with custom metrics, this can happen... ). Your prize is, you get to run gmetad with debug output on! :) > 1. Unless I also start gmond on XYZ, I won't be able to see anything on the > webfrontend. > I don't want XYZ to be part of the monitored clusters. The monitoring core probably shouldn't be running on the front-end. The metadaemon should be enough. > I know in previous reply, Steven Wagner has said that this should work, but > I am not able to get it to behave that way. Am I missing something very > obvious ? You know, gentle readers, you really shouldn't assume that because my setup works, anyone else's will. The stuff I did to get my Ganglia setup working went way beyond messing with the configs. Especially when it comes to the metadaemon... By this point in Ganglia 2.x's development, if someone is going through the kind of stuff I went through on any of the currently-supported platforms, something is wrong...

