James M Snell wrote:
The incremental parser model assures that only the objects we actually
need will be loaded into memory.  A better way to put it would be
parse-on-demand.  Think of it as a hybrid between the SAX and DOM
approaches.  The main advantage of this approach is that is uses
significantly less memory than DOM.
For times when you're reading only the first part of the document I can see when this would result in less memory and quicker access times. But for someone who needs to access most of the document - i.e. scan through the entries in the feed - the whole document will still need to be scanned/parsed, so that shouldn't result in any difference in memory/time over the normal DOM approach. That is, still an OMElementImpl will be created at some point each and every element. And each OMElement will stay have attributes, child elements, etc associated with it.

For instance - http://www.ibm.com/developerworks/webservices/library/ws-java2/. I think the Axiom numbers have probably improved to more JDOM/DOM4j levels since then, but still it shows that given equivalent documents which are eventually read/loaded into memory, it will have the same order of magnitude memory characteristics as anything else out there.

Or am I missing something here? Abdera doesn't just skip over elements which aren't accessed sequentially does it? Or are you saying that the benefit is just when you don't need to access the whole document? i.e. just read the feed metadata and not the entries?
Another advantage is that is means
we can introduce filters into the parsing process so that unwanted
elements are ignored completely (that's the ParseFilter stuff you see in
the core).  To illustrate the difference, a while back we used ROME
(which uses JDOM) to parse Tim Bray's Atom feed and output just titles
and links to System.out.  We used Abdera with a parse filter to do the
exact same test.  The JDOM approach used over 6MB of memory; the Abdera
approach used right around ~700 kb of memory.  The Abdera approach was
significantly faster as well.

Were you skipping all the elements except for the titles? If so, a more fair comparison would've implemented a stax/sax filter for JDOM as well. Also, not sure what parser you used for JDOM, but Woodstox is 1.5-10x faster than the standard SAX parsers IIRC so that may have been a factor.

- Dan

--
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com/blog

Reply via email to