So far I've found that the RRD_update errors can be caused by three things ...

1. duplicate cluster names
2. servers times not in sync
3. running gmetad in "non-scalable" mode when chaining gmetad's together

The first two can manifest themselves in any of the clusters being mentioned in 
the error log. However, the third one only affects the last cluster mentioned 
in any grid (telnet to port 8651 on a downstream gmetad and grep for CLUSTER). 
If the last cluster in the output is always the one mentioned in the error log 
you need to either turn scalablity back on (on by default) or try this patch... 

--- old/process_xml.c   2011-04-12 18:59:49.000000000 +0100
+++ new/process_xml.c   2011-04-12 23:39:23.000000000 +0100
@@ -1156,7 +1156,7 @@ end (void *data, const char *el)
       {
          case GRID_TAG:
             rc = endElement_GRID(data, el);
-            /* No break. */
+            if (!gmetad_config.scalable_mode) break;
 
          case CLUSTER_TAG:
             rc = endElement_CLUSTER(data, el);

Regards,
Nick

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to