I used to have problems similar to this one. It turned out that a
malformed gmetric value was "poisoning" gmetad. I suggest you check the
XML output of the data source as soon as gmetad starts reporting this
error, and see what's going on at that line in the XML...
(sounds like fun, doesn't it?)
Jason A. Smith wrote:
We recently started having problems with gmetad just dieing
unexpectedly, with no explanation is the system log. The only unusual
thing is a few xml parse errors several hours before gmetad dies:
Sep 3 04:50:14 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
1): XML_ParseBuffer() error at line 5467: not well-formed
Sep 3 07:01:11 ganglia01 /usr/sbin/gmetad[16435]: Process XML (Cluster
2): XML_ParseBuffer() error at line 388: duplicate attribute
Sep 3 07:01:12 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
1): XML_ParseBuffer() error at line 2383: not well-formed
Sep 3 07:01:25 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
1): XML_ParseBuffer() error at line 4368: not well-formed
Sep 3 09:46:13 ganglia01 /usr/sbin/gmetad[16429]: Process XML (Cluster
3): XML_ParseBuffer() error at line 569: not well-formed
Sep 3 09:46:13 ganglia01 /usr/sbin/gmetad[16437]: RRD_update
(/var/lib/ganglia/rrds/Cluster 1/rcas2100.rcf.bnl.gov/mem_free.rrd):
illegal attempt to update using time 1062596767 when last update time is
1062596767 (minimum one second step)
Sep 3 09:46:13 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
1): XML_ParseBuffer() error at line 1850: not well-formed
Sep 3 12:14:41 ganglia01 /usr/sbin/gmetad[16430]: Process XML (Cluster
4): XML_ParseBuffer() error at line 2: junk after document element
Sep 3 12:34:13 ganglia01 /usr/sbin/gmetad[16436]: Process XML (Cluster
5): XML_ParseBuffer() error at line 1298: not well-formed
Sep 3 13:12:20 ganglia01 /usr/sbin/gmetad[16434]: Process XML (Cluster
6): XML_ParseBuffer() error at line 527: not well-formed
Then gmetad dies almost an hour later.
Any ideas what the problem could be? I have tried restarting gmetad
with debugging and will wait to see if it happens again.
~Jason