Hi David, (Note I've restricted this to [EMAIL PROTECTED] and xerces-j-dev@; this will get Xerces-J-specific so probably won't interest any of the other lists much).
Thanks for answering my other questions! Also, thanks for pointing to those interesting performance numbers; it's been a while since I've seen a fresh XML processor benchmark. The methodology behind the tests isn't discussed very much though: In particular, what options did you give Xerces? For instance, were you using our "deferred DOM"--a DOM implementation that tries to refrain from "fluffing up" objects until they're needed? There's evidence that this slows performance on small documents, but can help on large documents where not all nodes are visited. Also, is parser start-up time factored in? We believe that the most common use-cases where performance is critical will involve reuse of parser objects; therefore, we always use at least 100 "warm-up" iterations on a given parser object we want to test before actually recording performance numbers. Finally, were Xerces grammar-caching capabilities used for the validation tests? Naturally, reading a schema is a pretty slow process; we always assume that users who care about performance will take care to preparse schemas so that they don't incur this cost at instance-validation time. Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] |---------+----------------------------> | | "David Bau" | | | <[EMAIL PROTECTED]| | | m> | | | | | | 07/06/2003 07:05 | | | AM | | | Please respond to| | | general | | | | |---------+----------------------------> >---------------------------------------------------------------------------------------------------------------------------------------------| | | | To: <[EMAIL PROTECTED]>, "Jakarta General List" <[EMAIL PROTECTED]> | | cc: <[EMAIL PROTECTED]> | | Subject: RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans] | | | | | >---------------------------------------------------------------------------------------------------------------------------------------------| Adding a few links and other info - Aleksander Slominski wrote: > http://dev2dev.bea.com/articles/hitesh_seth.jsp that is > good overview but has not enough technical details and > other docs): as far as i can understand actual objects Above you've linked to an XML Journal review reprint. Here is a page the points to other information: http://dev2dev.bea.com/technologies/xmlbeans/index.jsp One of the links is a very brief summary of some brutally transparent and upfront performance and test compliance numbers: http://workshop.bea.com/xmlbeans/schemaandperf.jsp BTW, despite the fact that we posted the numbers on pretty marketing pages on bea.com, the numbers above are not marketing-varnished numbers - they are the actual measurements that we developers track day-to-day. Those are numbers we measure to help us focus on use-cases that we're working on making faster. The XML cursor access _without_ strong-type conversion is between 10% and 58% faster than Xerces2 DOM access, going to about 35% for large (1Mb) XML documents. Xerces2, btw, is extremely speedy, so we're proud to be on par with it in any scenario! Adding strong-type conversion (for example parsing xs:int to java int and dates to Calendars) adds enough cost that reading the data out of a document is between 0% and 48% slower than reading out using (untyped) Xerces2 DOM. Apples-to-apples, we measure ourselves significantly faster than JAXB RI and Castor (140% to 282% and 66% to 800%). Please don't sue me - those are our real numbers, but if performance is important to your application, you should measure it for yourself. We do fault-in object allocations when demanded, and you can see in our memory test that when we fault-in all the objects for a whole document, we take up more memory than Xerces2 DOM. One current project is to take steps to reduce that number. When we use XmlCursor and don't fault-in all the objects, the memory number you will find to be much slimmer. (I don't have a measurement because our measurements focus on problem areas we're actually working on.) Eric Vasilik writes: > The synchronization described refers to the fact > that one may manipulate the XML via the XmlCursor > or the strongly typed XMLBean classes generated from > the schema As Eric says, we don't want to confuse the two uses of the word "synchronize". But since Aleksander brought it up - here's some information on thread-synchronization too. We examined both with- and without-thread-synchronized access, and found that without-thread-sync, programmers fall into traps like working with XML config files on multiple threads in thread-unsafe ways without without being aware of it. We found that it costs between 1% (strongly-typed access) and 10% (XmlCursor access) to synchronize. So we're currently synchronizing access to the data now, paying for more [app] stability with a little bit of perf. We'd like to provide the option to single-threaded (or savvy) users of not synchronizing to get the 1-10% back. That's future work. As Eric pointed out, the key I think is not in what our current numbers are, but the fact that we've isolated our implementation from our interface so that we have the flexibility of reducing allocations, deferring work, and otherwise improving performance further in the future. Abstracting the primary store behind a cursor rather than a tree of objects with identity gives us some leeway in shuffling our implementation strategy in the future without restructing the APIs. David Bau --------------------------------------------------------------------- In case of troubles, e-mail: [EMAIL PROTECTED] To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- In case of troubles, e-mail: [EMAIL PROTECTED] To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]