Hi,

We are using ganglia 3.1.1 to monitor our distributed system.
Recently, we encounter issues that some metrics will have too large
TN.

Our configuration context:
-  We have several gmond deployed on each node within a node group.
-  Gmond is configured to use multicast mode, so each gmond will have
all metrics for all hosts within the node group.
   Note: the issue also appears when gmond is configured with unicast mode.

The symptoms of these issues are:
-  TN may become two large when we reboot one of our nodes.
-  TN error patterns are different on each gmond. That is, some
gmond’s are completely ok, while others have different level of
errors.
-  TN error patterns are different on for each host. For metrics from
a single hosts, they may be OK on some gmond, while may have large TN
on other gmond.

We dumped package received by on of the nodes, it did received
repective metrics from all other hosts (and itself), but the metric
get from "telnet localhost 8649" is not restored to normal. So we
guess either kernel dropped these packages or gmond was unable to
handle these packages.

Our questions:
-  How will gmond update the metrics when the metirc TN is already too
large for whatever reasons?
-  Any ideas on why we got all of those symptoms?

Thanks very much for any inputs.

Best Regards,
Hang Zhou

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to