On Mon, May 12, 2008 at 08:33:12PM -0700, Jeremy LaTrasse wrote:
> 
>    1 or more hosts in a cluster lose track of 1 or more other members [TN=
>    goes beyond 20 and the graphs go completely out of whack]. Occasionally a
>    gmond stop + gmetad stop, wait ~5 minutes, turn them all back on, and
>    things look ok for ~10 minutes. Then some hosts are reported as having
>    reported more than 20 seconds ago.

this can only be explained by a multicast propagation problem, can you move to
unicast?

the simplest setup will be to point all your nodes on each cluster to one of
them (the collector) with :

udp_send_channel {
  host = <collector ip>
  port = 8649
}

then point to that "collector" in your gmetad.

Carlo

PS. in your configuration you are forcing the interface to use, that shouldn't
be needed with unicast.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to