Bob,

Make sure that you have lines similar to the following in your gmond.conf

/* channel to send multicast on mcast_channel:mcast_port */
udp_send_channel {
 mcast_join = "239.2.11.71"
 port = "8649"
}

/* channel to receive multicast from mcast_channel:mcast_port */
udp_recv_channel {
 mcast_join = "239.2.11.71"
 port = "8649"
}

tcp_accept_channel {
 port = "8649"
 timeout = -1
}

You asked if there a tool to validate the data. The answer to that question is yes, you are that tool. Telnet to both node1 and node2. Make sure that the section for node2 roughly matches the section for node1 in the xml you get back. Now look at what node2's xml is telling you by telneting to it as well. If you are having issues with multicast, you will probably not see node2 in node1's output. If you can not get ganglia to work, send your gmond.conf as well as the xml output from one of the nodes.

Ian

bob flynn wrote:

Ian,

Thanks for the responce. I have stopped ganglia daemons everywhere and am bringing up each node individually to assist
in troubleshooting.

I have modified gmetad.conf as follows;

data_source "linux cluster" 10 node1:8649
# data_source "solaris cluster" 10 localhost:8650
gridname "Leop LSF"

I have commented out the solaris cluster to avoid confusing the issue.

I have brought up node1 and it sees it perfectly.

I then brought up node2 and while it sees its name, there is no details about;

- OS
- Memory
- CPU's etc

when I do a physical view of the node.

I have not modified the gmedad.conf to include this node, as my understanding is;

gmetad simply needs to reference some same gmond nodes in the cluster, as they replicate data between them for redundancy. This works as long as multicasting is functioning between the
nodes.

Is this correct. If it is, is there any tools to validate that the data collected on node2 has been sucessfully replicated to node1, ie that the
multicasting is functioning ?

Cheers,

-Bob



My understanding is gmetad should be running only on the heard
Ian Cunningham wrote:

Bob,

I would definitely recommend this series of actions:
1. stop (kill) gmond on all nodes
2. restart the gmetad process
3. start gmond on all nodes

This will ensure that none of nodes or the gmetad collect any old data from old config files and that they are using the new config files.

Second make sure that you have two different gmond.confs, one for each cluster. In your email, you show that gmetad is connecting to two different ports on the same machine (localhost which you say is linux). From my understanding, you should be connecting to the head node (whatever its name is) for the solaris cluster and not localhost.

Try those suggestions,
Ian

bob flynn wrote:

Hi, I am attempting to setup ganglia for the first time. Having read the documentation and looked at a couple of examples. I still have a few queries. Let me explain what I am attempting , what I have configured and what I see.

What I am attempting.

To configure two clusters.

What I have configured.

1. web front end server, with the following installed;

apache
ganglia-gmetad-3.0.0-1
ganglia-web-3.0.0-1

The gmetad conf file /etc/gmetad.conf contains the following entries;

data_source "linux cluster" 10 localhost:8649
data_source "solaris cluster" 10 localhost:8650
gridname "Leop LSF"

everything else remains as default.

2. client machine. ( Linux box )

ganglia-gmond-3.0.0-1

The gmond conf file /etc/gmond.conf contains the following entries;

cluster {
 name = "linux cluster"
}

What I am seeing is;

a number of machines in "unspecified" cluster. How do I blow these away. I understand that if I either restart all daemons, or else enter

globals {
   host_dmax = 3600
 }

in the gmond.conf it should take care of this. The thing is some nodes appear on both lists, ie in linux cluster and unspecified.

The other thing I am seeing for the hosts visible in the "linux cluster" cluster is that they are showing as down. Yet when I run a

telnet localhost 8649

I see the xml output. I figure I should be able to see the aggragate data on the head node, ie the one with gmetad and the php web interface installed. How do I debug this ? I have not gone down the road of installing the solaris binaries until I have this sorted. Any help appreciated.

Tks,

-Bob



Reply via email to