Yes the growth appears to happen in bursts with plateus between growth cycles so Ruby's GC could definitely be the culprit.
I am taking the document and converting it to ruby objects that map (with some fudging) to the same structure as the XML. The XML originates from a SOAP based web service. I am converting the response from XML in to ruby objects using recursion to walk inside of each object and load any subobjects that are present in the XML. I am doing a rather involved series of type lookups so that I can cast the strings in the XML to the appropriate ruby datatype based on the XSD the web service provides. The memory growth happens even without any recursion or assignment going on, all I have to do to induce the growth is a few Node#find calls on the document. The general structure of my document is something like this made up example except a whole lot bigger and with around 100 different node types represented throughout the document. RestaurantGetResponse Restaurants Restaurant Location Street1 Street2 City Zip RestaurantName Owner Name PhoneNumber Genres Genre Genre Restaurant ..... Restaurant ..... Restaurant ..... I am making the parsed ruby objects available to a Rails application and I find that if I call GC.start when using the library with Rails that it takes several seconds to garbage collect and sometimes crashes. If I call GC.start in the loop when the program is running as a standalone process then GC.start returns in a few dozen milliseconds. I wrote a SAX style parser using libxml-ruby that does not suffer from the memory growth but it is about 30 times slower than the document based parser so I am really trying to make the document based approach work. Matt Margolis On Mon, Aug 11, 2008 at 4:38 PM, Sean Chittenden <[EMAIL PROTECTED]>wrote: > I am parsing 120K of XML into a document and then running >> >> def get_nodes(node, namespace) >> self.find("./dn:#{node}", "dn:#{namespace}") >> end >> >> several times. >> >> Memory usage for my test driver sits at 20 megs if I run get_nodes less >> than 10 times. If I run get_nodes 1000 times my memory usage jumps from 20 >> megs to around 140 megs and does not come back down until the process exits. >> If I force a GC.start at the end of each loop I can keep the memory usage >> down but that is not practical in the real world where I need this code to >> be at least somewhat fast. >> >> I am only building the document once during the entire duration of the >> test program so the parsing of the large string should not be a problem. >> >> Any ideas as to why my memory usage grows and then never comes down? >> > > If the memory usage caps off at certain levels but isn't continually > growing (i.e. a leak), then this is a "problem" with the Ruby GC and not > with libxml. libxml just leverages Ruby's GC for memory allocation, etc. > See if there is an updated GC patch that you can apply. I don't have the > URL handy, but this post makes reference to it: > > > http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/ > > One could argue, however, that using GC.start is practical if done in tight > loops. What exactly are you trying to do with your fragments? Maybe > there's a more efficient way of getting the result you're interested in. > > -sc > > -- > Sean Chittenden > [EMAIL PROTECTED] > > > > _______________________________________________ > libxml-devel mailing list > libxml-devel@rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel >
_______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel