Re: Understanding Incremental Parsing [was Re: failing parser test]

Dan Diephouse Tue, 09 Oct 2007 14:39:26 -0700

James M Snell wrote:

The incremental parser model assures that only the objects we actually
need will be loaded into memory.  A better way to put it would be
parse-on-demand.  Think of it as a hybrid between the SAX and DOM
approaches.  The main advantage of this approach is that is uses

significantly less memory than DOM.

For times when you're reading only the first part of the document I cansee when this would result in less memory and quicker access times. Butfor someone who needs to access most of the document - i.e. scan throughthe entries in the feed - the whole document will still need to bescanned/parsed, so that shouldn't result in any difference inmemory/time over the normal DOM approach. That is, still anOMElementImpl will be created at some point each and every element. Andeach OMElement will stay have attributes, child elements, etc associatedwith it.

For instance -http://www.ibm.com/developerworks/webservices/library/ws-java2/. I thinkthe Axiom numbers have probably improved to more JDOM/DOM4j levels sincethen, but still it shows that given equivalent documents which areeventually read/loaded into memory, it will have the same order ofmagnitude memory characteristics as anything else out there.

Or am I missing something here? Abdera doesn't just skip over elementswhich aren't accessed sequentially does it? Or are you saying that thebenefit is just when you don't need to access the whole document? i.e.just read the feed metadata and not the entries?

Another advantage is that is means
we can introduce filters into the parsing process so that unwanted
elements are ignored completely (that's the ParseFilter stuff you see in
the core).  To illustrate the difference, a while back we used ROME
(which uses JDOM) to parse Tim Bray's Atom feed and output just titles
and links to System.out.  We used Abdera with a parse filter to do the
exact same test.  The JDOM approach used over 6MB of memory; the Abdera
approach used right around ~700 kb of memory.  The Abdera approach was
significantly faster as well.

Were you skipping all the elements except for the titles? If so, a morefair comparison would've implemented a stax/sax filter for JDOM as well.Also, not sure what parser you used for JDOM, but Woodstox is 1.5-10xfaster than the standard SAX parsers IIRC so that may have been a factor.


- Dan

--
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com/blog

Re: Understanding Incremental Parsing [was Re: failing parser test]

Reply via email to