Quoting Bernard Li ([email protected]): > Could you please elaborate on what this "program" is doing when it > connects to gmetad?
We have a process which connects every 5-10 seconds or so to port 8651 to get a non-interactive dump of the XML. While it does not matter to the problem at hand, it then reduces the data down to a few concentrated figures and presents that data via a pipe to an OpenGL based display. > While I haven't run Ganglia under Xen instances, if I were to make a > guess, this is probably an I/O related issue. Is there any chance you > can run the gmetad instance on a bare metal box and see if your > situation improves? 64 hosts x 40 metrics can be easily handled by a > typical server. It is usually when you get into the high hundreds and > beyond that people usually need to implement the tmpfs workaround. I am suspecting either I/O or CPU, though I am instrumenting the mutexes to see if I am indeed correct about about some starvation the strace showed me. Having a futex() call blocked for nearly 500 seconds... gonna see who did the locking and such with the second level if the first level proves it out. As for bare metal... trying to get a box around here right now is a pain... we have around 700 servers, but they are all allocated right now. > Another thing you could try is rrdcached which is available in new > versions of RRDtool. I may do that. But management would prefer almost no changes and really fast solutions, and changing out a new version of software generally causes them to raise flags. > Regarding the patch, if you are to make one, please do so against > trunk as all code contribution needs to go there, and eventually > backported to our branches. Right now, what I have is mega ugly, and could not even handle multiple sources. I will, at some point though, see about putting it into code off the trunk with better coding. > Good luck with troubleshooting. Thanks! ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

