Dan Diephouse wrote: > James M Snell wrote: >> The incremental parser model assures that only the objects we actually >> need will be loaded into memory. A better way to put it would be >> parse-on-demand. Think of it as a hybrid between the SAX and DOM >> approaches. The main advantage of this approach is that is uses >> significantly less memory than DOM. > For times when you're reading only the first part of the document I can > see when this would result in less memory and quicker access times. But > for someone who needs to access most of the document - i.e. scan through > the entries in the feed - the whole document will still need to be > scanned/parsed, so that shouldn't result in any difference in > memory/time over the normal DOM approach. That is, still an > OMElementImpl will be created at some point each and every element. And > each OMElement will stay have attributes, child elements, etc associated > with it. > > For instance - > http://www.ibm.com/developerworks/webservices/library/ws-java2/. I think > the Axiom numbers have probably improved to more JDOM/DOM4j levels since > then, but still it shows that given equivalent documents which are > eventually read/loaded into memory, it will have the same order of > magnitude memory characteristics as anything else out there. >
True, but even when fully parsing a document, because of the way Axiom is implemented, we still realize a significant memory and speed improvement when working with the full document. I'd encourage you to run some of the numbers yourself. > Or am I missing something here? Abdera doesn't just skip over elements > which aren't accessed sequentially does it? Or are you saying that the > benefit is just when you don't need to access the whole document? i.e. > just read the feed metadata and not the entries? Abdera only consumes the stream when it's absolutely necessary to do so. Elements are not skipped over unless there is a ParseFilter in place telling it to do so. If I have a Feed with 100 entries, and all I do is feed.getTitle(), the 100 entries will never be parsed. Because Atom requires that the entries come after the rest of the feed level elements, I can read all of the feed metadata without ever having to parse the individual elements. When I call feed.getEntries(), Abdera returns a special List implementation that uses an internal iterator. That iterator will incrementally parse the stream, so if I do for (Entry entry : feed.getEntries()), each loop will incrementally parse the stream; however, if I do for (int n = 0; n < feed.getEntries().size(); n++), the call to size() will result in the entire stream being consumed in order to respond with the correct number of entries. >> Another advantage is that is means >> we can introduce filters into the parsing process so that unwanted >> elements are ignored completely (that's the ParseFilter stuff you see in >> the core). To illustrate the difference, a while back we used ROME >> (which uses JDOM) to parse Tim Bray's Atom feed and output just titles >> and links to System.out. We used Abdera with a parse filter to do the >> exact same test. The JDOM approach used over 6MB of memory; the Abdera >> approach used right around ~700 kb of memory. The Abdera approach was >> significantly faster as well. >> >> > Were you skipping all the elements except for the titles? If so, a more > fair comparison would've implemented a stax/sax filter for JDOM as well. > Also, not sure what parser you used for JDOM, but Woodstox is 1.5-10x > faster than the standard SAX parsers IIRC so that may have been a factor. > The test was based on the interfaces that ROME exposed at the time. >From what I recall, there was not a way for us to plug in any kind of parse filter. We could have just missed it, however. - James > - Dan >
smime.p7s
Description: S/MIME Cryptographic Signature
