It may not be a problem with Ganglia. It may be a problem with your network.
You're saying the line number in the error changes every time. That suggests to me that the transmission is getting fouled up at a different point each time which would be the expected behavior for an intermittent network problem. Is your network heavily taxed? Are all these machines local or do they talk over the WAN? Do you observe packet loss for anything? You may want to transfer some large files around and md5 them on the originating server and the destination server to see if they come across OK. Rich(ard) On Thu, May 14, 2009 at 4:47 PM, Adam Tygart <[email protected]> wrote: > Hello everyone, > > I have been having a hack of a time diagnosing this problem. I > recently updated to ganglia-3.1.2 for 3.0.7. Since then I have been > plagued with (what looked like) data errors, mis-reporting swap usage > was the easiest to see. This seems to be caused by some reporting > modules failing to load. They fail silently, I don't see logs about it > anywhere, and when I turn debugging on I still don't see anything. > Usually it is one of the modules, but I have had two occasionally > happen at the same time. modmem.so and modnet.so are the two to most > commonly fail. > > I have restarted with a new gmond configuration, changing only the > configuration of multicast to unicast, and this problem persists. I > have wiped my old rrd data. I have tried everything I know that could > even remotely be to blame for this problem. > > The question I have is this: is this a known bug? Is there something > else I should try? Can I force a module to be loaded? > > When the modules do load, hosts report to gmond, and gmeta grabs that > data and logs it. My webserver then serves up the data through the > ganglia interface. The problem I am having here is that I get > intermittent xml errors, mostly saying that there is a missing > on > line $SomeLineNumber (always changes). Happens every 15 minutes or so. > I cannot reproduce any problems with the xml, however. I ran xmllint > on the xml 1 per second for an hour with no errors, during which time > the web interface failed to load twice. > > I am also missing hosts from the web interface. The hosts (and > processors) get graphed properly on the composite graphs, but they > don't appear as "down," or as "up," they just disappear. I can enter > the hostname into the address bar, and get a current accurate graph > for it, though. Here is a screenshot of what I am talking about: > http://img.waffleimages.com/a47bc705ae3f5fd53a025e387ebbeb0c0841ad4a/Picture%2011.png > > If you'll notice, processor count says 10, while the graph shows 14. > This is because the host (janus) is missing from the list. Once in a > while, it will show up correctly (for one refresh) then disappear > again. > > > I am sorry that I have written a daunting wall of text, but I am in > need of fixing these issues to properly roll-out the interface. > > If it helps, ganglia was compiled on Gentoo through their build system > (portage). > > Thanks, > > Adam Tygart > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -- Richard Edward Horner Engineer / Composer / Electric Guitar Virtuoso richhorner.com | rhosts.net | sabayonlinux.org ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

