There is an assumption in gmond (pre_process_node()) that all nodes that
participate in the multicast should have resolvable hostnames on EVERY
node.
This is flawed in that in the common case the master node of a cluster is
NOT a full-blown dns server. And the slaves know nothing about the
external network that the master node is connected to.
But this is just a sideeffect of the fact that when gmond is started on
the master with 2 interfaces (eth0 - cluster network, eth1 - external
network) gmond uses the hostname of eth1 even though I start gmond with:
gmond --mcast_if eth0
Now, eth1 is the default route for the master; and is probably why gmond
is using that hostname?
The work around that I'm using right now is to start gmond on the master
as muted, i.e.: gmond --mcast_if eth0 --mute
if I don't all slave nodes get a ton of messages in /var/log/messages like:
Jan 2 02:32:36 fire30 /usr/sbin/gmond[1501]: gethostbyaddr error:
(remote_ip=192.168.254.141) A temporary error occured on an authoritative
name server.
NOTE: eth0 on master is 192.168.0.1
eth1 on master is 192.168.254.141
so why if I start with --mcast_if eth0 on master does gmond grab the
hostname from eth1?
Ideas?
Mike