All of the XML is sent within the intranet. In fact with this latest test, all of the XML is being passed through one switch. This is a 1Gbps switch with the switch itself being able to push 96Gbps split across all ports. The network is currently pushing 1MBps, don't think the network is maxed out. I have never had any packet loss within my network, and large files are passed on this network daily.
I don't believe that network is a factor for another simple reason: it worked fine with ganglia 3.0.7 since it was installed, at least 6 months ago. I also noted that I was running xmllint against the XML data both from gmetad and gmond and it was unable to find any problem with the XML. I did just have the web interface choke on the data again, the latest error being: There was an error collecting ganglia data (127.0.0.1:8655): XML error: Invalid document end at 1067 An immediate refresh (within seconds) and the interface was back Thanks for your quick response, Adam On Thu, May 14, 2009 at 11:58, Richard Edward Horner <r...@richhorner.com> wrote: > It may not be a problem with Ganglia. It may be a problem with your network. > > You're saying the line number in the error changes every time. That > suggests to me that the transmission is getting fouled up at a > different point each time which would be the expected behavior for an > intermittent network problem. Is your network heavily taxed? Are all > these machines local or do they talk over the WAN? Do you observe > packet loss for anything? You may want to transfer some large files > around and md5 them on the originating server and the destination > server to see if they come across OK. > > Rich(ard) > > On Thu, May 14, 2009 at 4:47 PM, Adam Tygart <adam.tyg...@gmail.com> wrote: >> Hello everyone, >> >> I have been having a hack of a time diagnosing this problem. I >> recently updated to ganglia-3.1.2 for 3.0.7. Since then I have been >> plagued with (what looked like) data errors, mis-reporting swap usage >> was the easiest to see. This seems to be caused by some reporting >> modules failing to load. They fail silently, I don't see logs about it >> anywhere, and when I turn debugging on I still don't see anything. >> Usually it is one of the modules, but I have had two occasionally >> happen at the same time. modmem.so and modnet.so are the two to most >> commonly fail. >> >> I have restarted with a new gmond configuration, changing only the >> configuration of multicast to unicast, and this problem persists. I >> have wiped my old rrd data. I have tried everything I know that could >> even remotely be to blame for this problem. >> >> The question I have is this: is this a known bug? Is there something >> else I should try? Can I force a module to be loaded? >> >> When the modules do load, hosts report to gmond, and gmeta grabs that >> data and logs it. My webserver then serves up the data through the >> ganglia interface. The problem I am having here is that I get >> intermittent xml errors, mostly saying that there is a missing > on >> line $SomeLineNumber (always changes). Happens every 15 minutes or so. >> I cannot reproduce any problems with the xml, however. I ran xmllint >> on the xml 1 per second for an hour with no errors, during which time >> the web interface failed to load twice. >> >> I am also missing hosts from the web interface. The hosts (and >> processors) get graphed properly on the composite graphs, but they >> don't appear as "down," or as "up," they just disappear. I can enter >> the hostname into the address bar, and get a current accurate graph >> for it, though. Here is a screenshot of what I am talking about: >> http://img.waffleimages.com/a47bc705ae3f5fd53a025e387ebbeb0c0841ad4a/Picture%2011.png >> >> If you'll notice, processor count says 10, while the graph shows 14. >> This is because the host (janus) is missing from the list. Once in a >> while, it will show up correctly (for one refresh) then disappear >> again. >> >> >> I am sorry that I have written a daunting wall of text, but I am in >> need of fixing these issues to properly roll-out the interface. >> >> If it helps, ganglia was compiled on Gentoo through their build system >> (portage). >> >> Thanks, >> >> Adam Tygart >> >> ------------------------------------------------------------------------------ >> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >> production scanning environment may not be a perfect world - but thanks to >> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 >> Series Scanner you'll get full speed at 300 dpi even with all image >> processing features enabled. http://p.sf.net/sfu/kodak-com >> _______________________________________________ >> Ganglia-general mailing list >> Ganglia-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-general >> > > > > -- > Richard Edward Horner > Engineer / Composer / Electric Guitar Virtuoso > richhorner.com | rhosts.net | sabayonlinux.org > ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general