On Wed, 2007-08-01 at 08:55 +0200, Jose Antonio Jimenez Baena wrote:
> I continously get the ganglia messages, but only for '__SummaryInfo__'
> category:
>
> Aug 1 00:01:49 spdbinfpr1 user:info /opt/freeware/sbin/gmetad
> [745560]: RRD_update (/var/lib/ganglia/rrds/p570
> _spdbsctms1/__SummaryInfo__/cpu_system.rrd): illegal attempt to update
> using time 1185919308 when last update
> time is 1185919308 (minimum one second step)
>
> Could it be because in fact I have several headnodes reporting info
> for the same level ( __SummaryInfo__ ) ? If this is the case, how
> could be avoided ?
I took a look at the source code. All the writing to the rrd files is
done by the write_data_to_rrd() function. The first two arguments are
the "sourcename" (which I think is just the cluster name) and the
"hostname". It appears that function is called in three places when
gmetad processes the XML data:
1) startElement_METRIC() - When a <metric> element is seen,
write_data_to_rrd(xmldata->sourcename, xmldata->hostname, ...) is called
to write the metric info to the host-specific rrd file.
2) finish_processing_source() - I think this is called when the
</cluster> tag is seen. It invokes write_data_to_rrd(xmldata-
>sourcename, NULL, ...). The NULL hostname indicates that the metric
rrd file in the cluster-specific __SummaryInfo__ dir should be written
to.
3) write_root_summary() - This invokes write_data_to_rrd(NULL,
NULL, ...). Using NULL for the sourcename and the hostname causes the
metric rrd file in the global __SummaryInfo__ dir to be updated.
I had two thoughts as to what might cause your problem, but I wasn't
able to test them (so they might be long shots):
1) It sounds like you have two "sources" with the same cluster name.
Maybe gmetad calls finish_processing_source() when it see the </cluster>
tag for the first source. It then updates the {cluster}/__SummaryInfo/
dir. When gmetad encounters the </cluster> tag for the second source of
the same name, it tries again to update files in
{cluster}/__SummaryInfo/ using the same timestamp. This could then
cause the error.
2) The hostname for one of the nodes in the xml data is NULL. I'm not
sure how this could happen, but if it did, then when startElement_METRIC
() tries to update the host's metric info, it actually calls
write_data_to_rrd(xmldata->sourcename, NULL, ...). This would update
the __SummaryInfo__ dir instead. Later, when the </cluster> tag is
seem, gmetad calls finish_processing_source() to update __SummaryInfo__
with the same timestamp. This might cause the error.
Like I said, these are shots in the dark. But if either sounds
plausible to you, it probably wouldn't be too hard for you to add a
couple of debug statements to the ganglia source to see if either of
them is the culprit.
--
Rick Mohr
Systems Developer
Ohio Supercomputer Center
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general