Yes the growth appears to happen in bursts with plateus between growth
cycles so Ruby's GC could definitely be the culprit.

I am taking the document and converting it to ruby objects that map (with
some fudging) to the same structure as the XML.  The XML originates from a
SOAP based web service.  I am converting the response from XML in to ruby
objects using recursion to walk inside of each object and load any
subobjects that are present in the XML.  I am doing a rather involved series
of type lookups so that I can cast the strings in the XML to the appropriate
ruby datatype based on the XSD the web service provides.

The memory growth happens even without any recursion or assignment going on,
all I have to do to induce the growth is a few Node#find calls on the
document.

The general structure of my document is something like this made up example
except a whole lot bigger and with around 100 different node types
represented throughout the document.

RestaurantGetResponse
  Restaurants
    Restaurant
      Location
         Street1
         Street2
         City
         Zip
      RestaurantName
      Owner
        Name
        PhoneNumber
      Genres
        Genre
        Genre
    Restaurant
     .....
    Restaurant
     .....
    Restaurant
     .....

I am making the parsed ruby objects available to a Rails application and I
find that if I call GC.start when using the library with Rails that it takes
several seconds to garbage collect and sometimes crashes.  If I call
GC.start in the loop when the program is running as a standalone process
then GC.start returns in a few dozen milliseconds.

I wrote a SAX style parser using libxml-ruby that does not suffer from the
memory growth but it is about 30 times slower than the document based parser
so I am really trying to make the document based approach work.

Matt Margolis


On Mon, Aug 11, 2008 at 4:38 PM, Sean Chittenden <[EMAIL PROTECTED]>wrote:

> I am parsing 120K of XML into a document and then running
>>
>>  def get_nodes(node, namespace)
>>    self.find("./dn:#{node}", "dn:#{namespace}")
>>  end
>>
>>  several times.
>>
>> Memory usage for my test driver sits at 20 megs if I run get_nodes less
>> than 10 times.  If I run get_nodes 1000 times my memory usage jumps from 20
>> megs to around 140 megs and does not come back down until the process exits.
>>  If I force a GC.start at the end of each loop I can keep the memory usage
>> down but that is not practical in the real world where I need this code to
>> be at least somewhat fast.
>>
>> I am only building the document once during the entire duration of the
>> test program so the parsing of the large string should not be a problem.
>>
>> Any ideas as to why my memory usage grows and then never comes down?
>>
>
> If the memory usage caps off at certain levels but isn't continually
> growing (i.e. a leak), then this is a "problem" with the Ruby GC and not
> with libxml.  libxml just leverages Ruby's GC for memory allocation, etc.
>  See if there is an updated GC patch that you can apply.  I don't have the
> URL handy, but this post makes reference to it:
>
>
> http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/
>
> One could argue, however, that using GC.start is practical if done in tight
> loops.  What exactly are you trying to do with your fragments?  Maybe
> there's a more efficient way of getting the result you're interested in.
>
> -sc
>
> --
> Sean Chittenden
> [EMAIL PROTECTED]
>
>
>
> _______________________________________________
> libxml-devel mailing list
> libxml-devel@rubyforge.org
> http://rubyforge.org/mailman/listinfo/libxml-devel
>
_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to