Lev Lvovsky wrote:
We're using the DOM parser on some rather large files, seeing the memory footprint of our application get quite large - I've seen mention of "deferred node parsing" with the DOM parser, but haven't been able to find any methods to set this option - is this something that's available to xerces-c DOM parser?
Perhaps you are referring to a feature of Xerces-J:

http://xerces.apache.org/xerces2-j/features.html

"http://apache.org/xml/features/dom/defer-node-expansion

True:    Lazily expand the DOM nodes.
False:          Fully expand the DOM nodes.
Default:        true**
Note: In the LSParser implementation the default value of this feature is false. Note: When this feature is set to true, the DOM nodes in the returned document are expanded as the tree is traversed. This allows the parser to return a document faster than if the tree is fully expanded during parsing and improves memory usage when the whole tree is not traversed. "


We're starting off with a file that's 50MB in size, and it's only going to get bigger from there. Currently it's taking up approx 500MB because it's XML wrapping escaped XML, once we get the contents of the wrapped XML, the memory goes down to the 250MB range, which is still a lot considering that the size of the file will grow in length by at least 20x. Before I delve into the SAX parser, are there any other tricks to using DOM while minimizing the memory footprint?
The DOM already uses a fair amount of large block allocation to help keep down the heap overhead. I'm not sure you will find any general-purpose DOM implementation that will work for your situation.

Unless you really need random access to the entire document, I would suggest you switch to SAX parsing, or use a combination of SAX and DOM parsing where you split the document into a set of documents.

Dave

Reply via email to