Charlie, I am running on OSX and RedHat. I am using the Node#find method with an XPath expression for the currently desired node in the default namespace of the document. The crashes stopped happening when I set my nodes variable to nil before calling GC.start. The memory does not spike too much if I call GC.start after every single Node#find but since parsing a single document into the required number of ruby objects necessesitates calling Node#find over a thousand times GC.start is really slowing things down.
>From what I can tell calling Node#find on such a large document is causing Ruby to add extra object heaps which increases my memory usage in a way that the program does not recover from. This is unfortunate since I want to run multiple processes per box but each process is using several hundred megabytes of RAM after parsing a few large documents. The SAX parser with empty callbacks can rip through the document in about 17ms which is very fast in my opinion. The speed problem arrises when I try to do anything in the callbacks. The nature of the program and the structure of the XML requires me to do quite few lookups in a series of hashes to determine the type of the current node and the type of each text element. When SAX parsing I have to hit the hashes more often since I don't have as much context information available as I do with a recursive depth first document walk with the document parser node objects. With the necessary code in the callbacks I was seeing parse times around 400ms which is about twice as slow as the document based approach. XMLReader looks very interesting from the API docs but I am not sure that I grok how to actually use it. I will keep searching for resources but if you know of any examples of usage out there I would love to read some code. Thank you, Matt Margolis 2008/8/16 Charlie Savage <[EMAIL PROTECTED]> > Hi Matt, > > I am making the parsed ruby objects available to a Rails application and I >> find that if I call GC.start when using the library with Rails that it takes >> several seconds to garbage collect and sometimes crashes. If I call >> GC.start in the loop when the program is running as a standalone process >> then GC.start returns in a few dozen milliseconds. >> > > What platform are you using? Can you run a debug version and get a stack > trace so we can see what is going on? Are you using XPath? If so, make > sure to free pointers to your XPath result objects and call GC.start before > the associated documents get freed (see the rdocs for more info, > document#find I think it is). > > I wrote a SAX style parser using libxml-ruby that does not suffer from the >> memory growth but it is about 30 times slower than the document based parser >> so I am really trying to make the document based approach work. >> > > Why do you suppose SAX is so much slower. It should be a lot faster since > it doesn't build an in-memory tree. > > Any chance the XMLReader would work for you? > > Charlie > > _______________________________________________ > libxml-devel mailing list > libxml-devel@rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel >
_______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel