James M Snell wrote:
The incremental parser model assures that only the objects we actually
need will be loaded into memory. A better way to put it would be
parse-on-demand. Think of it as a hybrid between the SAX and DOM
approaches. The main advantage of this approach is that is uses
significantly less memory than DOM.
For times when you're reading only the first part of the document I can
see when this would result in less memory and quicker access times. But
for someone who needs to access most of the document - i.e. scan through
the entries in the feed - the whole document will still need to be
scanned/parsed, so that shouldn't result in any difference in
memory/time over the normal DOM approach. That is, still an
OMElementImpl will be created at some point each and every element. And
each OMElement will stay have attributes, child elements, etc associated
with it.
For instance -
http://www.ibm.com/developerworks/webservices/library/ws-java2/. I think
the Axiom numbers have probably improved to more JDOM/DOM4j levels since
then, but still it shows that given equivalent documents which are
eventually read/loaded into memory, it will have the same order of
magnitude memory characteristics as anything else out there.
Or am I missing something here? Abdera doesn't just skip over elements
which aren't accessed sequentially does it? Or are you saying that the
benefit is just when you don't need to access the whole document? i.e.
just read the feed metadata and not the entries?
Another advantage is that is means
we can introduce filters into the parsing process so that unwanted
elements are ignored completely (that's the ParseFilter stuff you see in
the core). To illustrate the difference, a while back we used ROME
(which uses JDOM) to parse Tim Bray's Atom feed and output just titles
and links to System.out. We used Abdera with a parse filter to do the
exact same test. The JDOM approach used over 6MB of memory; the Abdera
approach used right around ~700 kb of memory. The Abdera approach was
significantly faster as well.
Were you skipping all the elements except for the titles? If so, a more
fair comparison would've implemented a stax/sax filter for JDOM as well.
Also, not sure what parser you used for JDOM, but Woodstox is 1.5-10x
faster than the standard SAX parsers IIRC so that may have been a factor.
- Dan
--
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com/blog