Re: deferred node parsing using DOM parser

David Bertoni Thu, 08 May 2008 22:03:05 -0700

Lev Lvovsky wrote:

We're using the DOM parser on some rather large files, seeing the memoryfootprint of our application get quite large - I've seen mention of"deferred node parsing" with the DOM parser, but haven't been able tofind any methods to set this option - is this something that's availableto xerces-c DOM parser?

Perhaps you are referring to a feature of Xerces-J:


http://xerces.apache.org/xerces2-j/features.html

"http://apache.org/xml/features/dom/defer-node-expansion

True:    Lazily expand the DOM nodes.
False:          Fully expand the DOM nodes.
Default:        true**

Note: In the LSParser implementation the default value of this featureis false.Note: When this feature is set to true, the DOM nodes in the returneddocument are expanded as the tree is traversed. This allows the parserto return a document faster than if the tree is fully expanded duringparsing and improves memory usage when the whole tree is not traversed. "

We're starting off with a file that's 50MB in size, and it's only goingto get bigger from there. Currently it's taking up approx 500MB becauseit's XML wrapping escaped XML, once we get the contents of the wrappedXML, the memory goes down to the 250MB range, which is still a lotconsidering that the size of the file will grow in length by at least20x. Before I delve into the SAX parser, are there any other tricks tousing DOM while minimizing the memory footprint?

The DOM already uses a fair amount of large block allocation to helpkeep down the heap overhead. I'm not sure you will find anygeneral-purpose DOM implementation that will work for your situation.

Unless you really need random access to the entire document, I wouldsuggest you switch to SAX parsing, or use a combination of SAX and DOMparsing where you split the document into a set of documents.


Dave

Re: deferred node parsing using DOM parser

Reply via email to