Hi, We are using ganglia 3.1.1 to monitor our distributed system. Recently, we encounter issues that some metrics will have too large TN.
Our configuration context: - We have several gmond deployed on each node within a node group. - Gmond is configured to use multicast mode, so each gmond will have all metrics for all hosts within the node group. Note: the issue also appears when gmond is configured with unicast mode. The symptoms of these issues are: - TN may become two large when we reboot one of our nodes. - TN error patterns are different on each gmond. That is, some gmond’s are completely ok, while others have different level of errors. - TN error patterns are different on for each host. For metrics from a single hosts, they may be OK on some gmond, while may have large TN on other gmond. We dumped package received by on of the nodes, it did received repective metrics from all other hosts (and itself), but the metric get from "telnet localhost 8649" is not restored to normal. So we guess either kernel dropped these packages or gmond was unable to handle these packages. Our questions: - How will gmond update the metrics when the metirc TN is already too large for whatever reasons? - Any ideas on why we got all of those symptoms? Thanks very much for any inputs. Best Regards, Hang Zhou ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

