Bob,
Make sure that you have lines similar to the following in your gmond.conf
/* channel to send multicast on mcast_channel:mcast_port */
udp_send_channel {
mcast_join = "239.2.11.71"
port = "8649"
}
/* channel to receive multicast from mcast_channel:mcast_port */
udp_recv_channel {
mcast_join = "239.2.11.71"
port = "8649"
}
tcp_accept_channel {
port = "8649"
timeout = -1
}
You asked if there a tool to validate the data. The answer to that
question is yes, you are that tool. Telnet to both node1 and node2. Make
sure that the section for node2 roughly matches the section for node1 in
the xml you get back. Now look at what node2's xml is telling you by
telneting to it as well. If you are having issues with multicast, you
will probably not see node2 in node1's output. If you can not get
ganglia to work, send your gmond.conf as well as the xml output from one
of the nodes.
Ian
bob flynn wrote:
Ian,
Thanks for the responce. I have stopped ganglia daemons everywhere
and am bringing up each node individually to assist
in troubleshooting.
I have modified gmetad.conf as follows;
data_source "linux cluster" 10 node1:8649
# data_source "solaris cluster" 10 localhost:8650
gridname "Leop LSF"
I have commented out the solaris cluster to avoid confusing the issue.
I have brought up node1 and it sees it perfectly.
I then brought up node2 and while it sees its name, there is no
details about;
- OS
- Memory
- CPU's etc
when I do a physical view of the node.
I have not modified the gmedad.conf to include this node, as my
understanding is;
gmetad simply needs to reference some same gmond nodes in the cluster,
as they replicate
data between them for redundancy. This works as long as multicasting
is functioning between the
nodes.
Is this correct. If it is, is there any tools to validate that the
data collected on node2 has been sucessfully replicated to node1, ie
that the
multicasting is functioning ?
Cheers,
-Bob
My understanding is gmetad should be running only on the heard
Ian Cunningham wrote:
Bob,
I would definitely recommend this series of actions:
1. stop (kill) gmond on all nodes
2. restart the gmetad process
3. start gmond on all nodes
This will ensure that none of nodes or the gmetad collect any old
data from old config files and that they are using the new config files.
Second make sure that you have two different gmond.confs, one for
each cluster. In your email, you show that gmetad is connecting to
two different ports on the same machine (localhost which you say is
linux). From my understanding, you should be connecting to the head
node (whatever its name is) for the solaris cluster and not localhost.
Try those suggestions,
Ian
bob flynn wrote:
Hi, I am attempting to setup ganglia for the first time. Having read
the documentation and looked at a couple of examples. I still have
a few queries. Let me explain what I am attempting , what I have
configured and what I see.
What I am attempting.
To configure two clusters.
What I have configured.
1. web front end server, with the following installed;
apache
ganglia-gmetad-3.0.0-1
ganglia-web-3.0.0-1
The gmetad conf file /etc/gmetad.conf contains the following entries;
data_source "linux cluster" 10 localhost:8649
data_source "solaris cluster" 10 localhost:8650
gridname "Leop LSF"
everything else remains as default.
2. client machine. ( Linux box )
ganglia-gmond-3.0.0-1
The gmond conf file /etc/gmond.conf contains the following entries;
cluster {
name = "linux cluster"
}
What I am seeing is;
a number of machines in "unspecified" cluster. How do I blow these
away. I understand that if I either restart all daemons, or else enter
globals {
host_dmax = 3600
}
in the gmond.conf it should take care of this. The thing is some
nodes appear on both lists, ie in linux cluster and unspecified.
The other thing I am seeing for the hosts visible in the "linux
cluster" cluster is that they are showing as down. Yet when I run a
telnet localhost 8649
I see the xml output. I figure I should be able to see the aggragate
data on the head node, ie the one with gmetad and the php web
interface installed. How do I debug this ? I have not gone down the
road of installing the solaris binaries until I have this sorted.
Any help appreciated.
Tks,
-Bob