On Mon, May 12, 2008 at 08:33:12PM -0700, Jeremy LaTrasse wrote:
>
> 1 or more hosts in a cluster lose track of 1 or more other members [TN=
> goes beyond 20 and the graphs go completely out of whack]. Occasionally a
> gmond stop + gmetad stop, wait ~5 minutes, turn them all back on, and
> things look ok for ~10 minutes. Then some hosts are reported as having
> reported more than 20 seconds ago.
this can only be explained by a multicast propagation problem, can you move to
unicast?
the simplest setup will be to point all your nodes on each cluster to one of
them (the collector) with :
udp_send_channel {
host = <collector ip>
port = 8649
}
then point to that "collector" in your gmetad.
Carlo
PS. in your configuration you are forcing the interface to use, that shouldn't
be needed with unicast.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general