Quoting Bernard Li ([email protected]):
> Could you please elaborate on what this "program" is doing when it
> connects to gmetad?

We have a process which connects every 5-10 seconds or so to port 8651
to get a non-interactive dump of the XML.  While it does not matter to
the problem at hand, it then reduces the data down to a few
concentrated figures and presents that data via a pipe to an OpenGL
based display. 

> While I haven't run Ganglia under Xen instances, if I were to make a
> guess, this is probably an I/O related issue.  Is there any chance you
> can run the gmetad instance on a bare metal box and see if your
> situation improves?  64 hosts x 40 metrics can be easily handled by a
> typical server.  It is usually when you get into the high hundreds and
> beyond that people usually need to implement the tmpfs workaround.

I am suspecting either I/O or CPU, though I am instrumenting the
mutexes to see if I am indeed correct about about some starvation the
strace showed me.  Having a futex() call blocked for nearly 500
seconds... gonna see who did the locking and such with the second
level if the first level proves it out.

As for bare metal... trying to get a box around here right now is a
pain... we have around 700 servers, but they are all allocated right now.

> Another thing you could try is rrdcached which is available in new
> versions of RRDtool.

I may do that.  But management would prefer almost no changes and really
fast solutions, and changing out a new version of software generally
causes them to raise flags.

> Regarding the patch, if you are to make one, please do so against
> trunk as all code contribution needs to go there, and eventually
> backported to our branches.

Right now, what I have is mega ugly, and could not even handle
multiple sources.  I will, at some point though, see about putting it
into code off the trunk with better coding. 

> Good luck with troubleshooting.

Thanks!


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to