All of the XML is sent within the intranet. In fact with this latest
test, all of the XML is being passed through one switch. This is a
1Gbps switch with the switch itself being able to push 96Gbps split
across all ports. The network is currently pushing 1MBps, don't think
the network is maxed out. I have never had any packet loss within my
network, and large files are passed on this network daily.

I don't believe that network is a factor for another simple reason: it
worked fine with ganglia 3.0.7 since it was installed, at least 6
months ago.

I also noted that I was running xmllint against the XML data both from
gmetad and gmond and it was unable to find any problem with the XML.

I did just have the web interface choke on the data again, the latest
error being: There was an error collecting ganglia data
(127.0.0.1:8655): XML error: Invalid document end at 1067

An immediate refresh (within seconds) and the interface was back

Thanks for your quick response,
Adam

On Thu, May 14, 2009 at 11:58, Richard Edward Horner
<r...@richhorner.com> wrote:
> It may not be a problem with Ganglia. It may be a problem with your network.
>
> You're saying the line number in the error changes every time. That
> suggests to me that the transmission is getting fouled up at a
> different point each time which would be the expected behavior for an
> intermittent network problem. Is your network heavily taxed? Are all
> these machines local or do they talk over the WAN? Do you observe
> packet loss for anything? You may want to transfer some large files
> around and md5 them on the originating server and the destination
> server to see if they come across OK.
>
> Rich(ard)
>
> On Thu, May 14, 2009 at 4:47 PM, Adam Tygart <adam.tyg...@gmail.com> wrote:
>> Hello everyone,
>>
>> I have been having a hack of a time diagnosing this problem. I
>> recently updated to ganglia-3.1.2 for 3.0.7. Since then I have been
>> plagued with (what looked like) data errors, mis-reporting swap usage
>> was the easiest to see. This seems to be caused by some reporting
>> modules failing to load. They fail silently, I don't see logs about it
>> anywhere, and when I turn debugging on I still don't see anything.
>> Usually it is one of the modules, but I have had two occasionally
>> happen at the same time. modmem.so and modnet.so are the two to most
>> commonly fail.
>>
>> I have restarted with a new gmond configuration, changing only the
>> configuration of multicast to unicast, and this problem persists. I
>> have wiped my old rrd data. I have tried everything I know that could
>> even remotely be to blame for this problem.
>>
>> The question I have is this: is this a known bug? Is there something
>> else I should try? Can I force a module to be loaded?
>>
>> When the modules do load, hosts report to gmond, and gmeta grabs that
>> data and logs it. My webserver then serves up the data through the
>> ganglia interface. The problem I am having here is that I get
>> intermittent xml errors, mostly saying that there is a missing > on
>> line $SomeLineNumber (always changes). Happens every 15 minutes or so.
>> I cannot reproduce any problems with the xml, however. I ran xmllint
>> on the xml 1 per second for an hour with no errors, during which time
>> the web interface failed to load twice.
>>
>> I am also missing hosts from the web interface. The hosts (and
>> processors) get graphed properly on the composite graphs, but they
>> don't appear as "down," or as "up," they just disappear. I can enter
>> the hostname into the address bar, and get a current accurate graph
>> for it, though. Here is a screenshot of what I am talking about:
>> http://img.waffleimages.com/a47bc705ae3f5fd53a025e387ebbeb0c0841ad4a/Picture%2011.png
>>
>> If you'll notice, processor count says 10, while the graph shows 14.
>> This is because the host (janus) is missing from the list. Once in a
>> while, it will show up correctly (for one refresh) then disappear
>> again.
>>
>>
>> I am sorry that I have written a daunting wall of text, but I am in
>> need of fixing these issues to properly roll-out the interface.
>>
>> If it helps, ganglia was compiled on Gentoo through their build system
>> (portage).
>>
>> Thanks,
>>
>> Adam Tygart
>>
>> ------------------------------------------------------------------------------
>> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> production scanning environment may not be a perfect world - but thanks to
>> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
>> Series Scanner you'll get full speed at 300 dpi even with all image
>> processing features enabled. http://p.sf.net/sfu/kodak-com
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>>
>
>
>
> --
> Richard Edward Horner
> Engineer / Composer / Electric Guitar Virtuoso
> richhorner.com | rhosts.net | sabayonlinux.org
>

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to