Hello everyone,

I have been having a hack of a time diagnosing this problem. I
recently updated to ganglia-3.1.2 for 3.0.7. Since then I have been
plagued with (what looked like) data errors, mis-reporting swap usage
was the easiest to see. This seems to be caused by some reporting
modules failing to load. They fail silently, I don't see logs about it
anywhere, and when I turn debugging on I still don't see anything.
Usually it is one of the modules, but I have had two occasionally
happen at the same time. modmem.so and modnet.so are the two to most
commonly fail.

I have restarted with a new gmond configuration, changing only the
configuration of multicast to unicast, and this problem persists. I
have wiped my old rrd data. I have tried everything I know that could
even remotely be to blame for this problem.

The question I have is this: is this a known bug? Is there something
else I should try? Can I force a module to be loaded?

When the modules do load, hosts report to gmond, and gmeta grabs that
data and logs it. My webserver then serves up the data through the
ganglia interface. The problem I am having here is that I get
intermittent xml errors, mostly saying that there is a missing > on
line $SomeLineNumber (always changes). Happens every 15 minutes or so.
I cannot reproduce any problems with the xml, however. I ran xmllint
on the xml 1 per second for an hour with no errors, during which time
the web interface failed to load twice.

I am also missing hosts from the web interface. The hosts (and
processors) get graphed properly on the composite graphs, but they
don't appear as "down," or as "up," they just disappear. I can enter
the hostname into the address bar, and get a current accurate graph
for it, though. Here is a screenshot of what I am talking about:
http://img.waffleimages.com/a47bc705ae3f5fd53a025e387ebbeb0c0841ad4a/Picture%2011.png

If you'll notice, processor count says 10, while the graph shows 14.
This is because the host (janus) is missing from the list. Once in a
while, it will show up correctly (for one refresh) then disappear
again.


I am sorry that I have written a daunting wall of text, but I am in
need of fixing these issues to properly roll-out the interface.

If it helps, ganglia was compiled on Gentoo through their build system
(portage).

Thanks,

Adam Tygart

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to