I see this behavior all the time. Sometimes it's even a feature, though most of
the time it's a bug.
The easiest way to cause it is to have gmonds for two different data_source
lines using the same "cluster name" in their gmond.conf - which I think is what
you're doing. That makes two different gmetad threads write to the same place -
and, by the way, screws up your cluster-level statistics, since the two
different gmetad threads maintain different ideas of the sum of the metrics for
that cluster.
If you don't do many cluster-level reports, you won't notice the latter, but we
do a lot of them. On the other hand, it can be used as a feature: by
deliberately doing it (and ignoring your cluster reports), you can combine two
different multicast domains into the same cluster in the GUI. I suspect that's
the benefit you're looking for.
You'd be better off either treating the XML streams as separate clusters, or
combining the streams. We combine the streams most of the time: we use
separate "gmetric"-based monitors to push everything through the gmonds for a
cluster. The RRD_update message is telling you something really is wrong:
gmetad can't add correctly, because it's keeping a different set of sums for
apples than for oranges.
You can use different streams as long as you declare them as different clusters
in their gmond.confs. E.g., 'cluster-X-compute' versus 'cluster-X-ipmi'.
-- ReC
On 5/14/09 12:19 PM, "Michael Will" <[email protected]> wrote:
I have a cluster setup with /etc/gmetad.conf configured to pull XML data out of
two daemons for the same cluster.
Both have their own subset of data (one does system performance and the other
does ipmi metrics) and all seems to work well
except that sometimes gmetad seems to pull the data from both so quickly that
both reports use the same timestamp. That should be OK since the metrics are
not the same (i.e. first the cpu load is recorded, then from the second stream
the cpu temperature is recorded, however it then floods /var/log/messages with
complains that the last update was not at least one second ago.
I was thinking of patching gmetad/rrdhelper.c to log it as a debug message
instead of an error since then it will only end up
in syslog when gmetad was started with debug>0.
Are there other (better) solutions to the issue, short of having to integrate
both XML streams into one?
I see previous postings with the issue but no solution:
http://www.mail-archive.com/[email protected]/msg03117.html
http://sourceforge.net/mailarchive/message.php?msg_id=480F60C6.8060902%40attributor.com
http://sourceforge.net/mailarchive/message.php?msg_id=4933BC4F.1030508%40cern.ch
The specific versions of ganglia-gmetad and rrdtool are:
ganglia-gmetad-3.0.7
rrdtool-1.0.49
The patch I had in mind was
diff -c gmetad-orig/rrd_helpers.c gmetad/rrd_helpers.c
*** gmetad-orig/rrd_helpers.c 2009-05-01 01:01:33.000000000 -0700
--- gmetad/rrd_helpers.c 2009-05-01 01:01:49.000000000 -0700
***************
*** 52,58 ****
rrd_update(argc, argv);
if(rrd_test_error())
{
! err_msg("RRD_update (%s): %s", rrd, rrd_get_error());
pthread_mutex_unlock( &rrd_mutex );
return 0;
}
--- 52,58 ----
rrd_update(argc, argv);
if(rrd_test_error())
{
! debug_msg("RRD_update (%s): %s", rrd, rrd_get_error());
pthread_mutex_unlock( &rrd_mutex );
return 0;
}
Cheers, Michael Will
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general