So far I've found that the RRD_update errors can be caused by three things ...
1. duplicate cluster names 2. servers times not in sync 3. running gmetad in "non-scalable" mode when chaining gmetad's together The first two can manifest themselves in any of the clusters being mentioned in the error log. However, the third one only affects the last cluster mentioned in any grid (telnet to port 8651 on a downstream gmetad and grep for CLUSTER). If the last cluster in the output is always the one mentioned in the error log you need to either turn scalablity back on (on by default) or try this patch... --- old/process_xml.c 2011-04-12 18:59:49.000000000 +0100 +++ new/process_xml.c 2011-04-12 23:39:23.000000000 +0100 @@ -1156,7 +1156,7 @@ end (void *data, const char *el) { case GRID_TAG: rc = endElement_GRID(data, el); - /* No break. */ + if (!gmetad_config.scalable_mode) break; case CLUSTER_TAG: rc = endElement_CLUSTER(data, el); Regards, Nick ------------------------------------------------------------------------------ Forrester Wave Report - Recovery time is now measured in hours and minutes not days. Key insights are discussed in the 2010 Forrester Wave Report as part of an in-depth evaluation of disaster recovery service providers. Forrester found the best-in-class provider in terms of services and vision. Read this report now! http://p.sf.net/sfu/ibm-webcastpromo _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general