This project seems to provide substantial performance advantages in XML processing. Take a look at the benchmark paper: http://www.ximpleware.com/.
Anne On 7/29/05, Dennis Sosnoski <[EMAIL PROTECTED]> wrote: > Looks like we've got a thread going, Eran! > > Dan, I don't think anyone has done a performance analysis for a typed > parser as such. It'd really need to be done in the context of some sort > of data binding framework to be meaningful. The only thing which has > been done along these lines that I'm aware of is Sun's "FAST Web > Services", which merged mutant forms of JAXB and JAX-RPC so that they > could do binary input/output. In their case they used ASN.1 > encoding/decoding of the binary data, with the ASN.1 representation > generated from an XML Schema. > > They saw much faster performance than the conventional JAX-RPC code. > But, my own JibxSoap (a subproject of JiBX, http://www.jibx.org) > delivers performance that appears to be about as good while still using > standard text XML. I say "appears to be" because at the time I did the > web services performance comparisons > (http://www.sosnoski.com/presents/cleansoap/comparing.html) the Sun > stuff was all proprietary. They've since opened it up on java.net, I > think, though I don't know what kind of license restrictions might apply. > > My own gut feeling is that if I used a typed parser interface for binary > input/output with JiBX/JibxSoap I could probably get 2-2.5 X the > processing speed of text (vs. probably about 1.4-1.8 X with my XBIS > binary XML format, which still keeps values as text and can be > translated to and from the text representation). > > There are actually some other areas where parser usability could be > improved, though, besides implementing a typed interface. I think > implementing a parser that supplied element and attribute names as > singleton QName objects of some form (rather than separate namespace > URI, local name, and qualified name text values) would be a big gain, > for instance. The text APIs could also be better designed; in the case > of the StAX XMLReader, rather than returning an array plus start offset > plus length for element content, all using separate method calls, it'd > be cleaner to just return the equivalent of a JDK 1.5 CharSequence > (which could be reusable). Likewise on the attribute values, where StAX > returns Strings. Returning CharSequence-equivalents would not only avoid > unnecessary String creation (in the case of attribute values), it would > also eliminate the need to translate the raw byte stream to character > arrays for common encodings (especially the UTF-8 and UTF-16 used in > BP-compliant web services). > > Unfortunately, I think developers sometimes misapply Knuth's (or Hoare's > - I'm not sure who got this started) "premature optimization is the root > of all evil" aphorism by designing APIs without any thought to > performance. Once performance bottlenecks have been built into the APIs > it's very difficult to get around them without scrapping things and > starting over. > > - Dennis > > Dan Diephouse wrote: > > > Has anyone done any performance tests (binary or just plan text) with > > the typed stax stuff? Does it really make a difference? > > - Dan > > > > Eran Chinthaka wrote: > > > >> Hi Dennis, > >> > >> You have commented on typed pull parser in wiki. Shall we start a thread > >> about it here ? > >> > >> -- EC > >> > >> > >> > >>> -----Original Message----- > >>> From: Apache Wiki [mailto:[EMAIL PROTECTED] > >>> Sent: Thursday, July 28, 2005 10:31 PM > >>> To: [email protected] > >>> Subject: [Ws Wiki] Update of > >>> "FrontPage/Axis2/Tasks/BinarySerialization" > >>> by DennisSosnoski > >>> > >>> Dear Wiki user, > >>> > >>> You have subscribed to a wiki page or wiki category on "Ws Wiki" for > >>> change notification. > >>> > >>> The following page has been changed by DennisSosnoski: > >>> http://wiki.apache.org/ws/FrontPage/Axis2/Tasks/BinarySerialization > >>> > >>> -------------------------------------------------------------------------- > >>> > >>> ---- > >>> decoding the binary into an int, converting to a string for the parser > >>> API and then back to an int in the deserialisation code. > >>> > >>> + I (DennisSosnoski) would personally disagree with the above > >>> assessment. > >>> A typed pull parser would definitely be nice, but even without this you > >>> can get substantial size and performance gains from a binary format. > >>> See > >>> my articles on devWorks at http://www- > >>> 128.ibm.com/developerworks/xml/library/x-trans1.html and http://www- > >>> 128.ibm.com/developerworks/xml/library/x-trans2/index.html for > >>> examples. > >>> + > >>> > >> > >> > >> > >> > >> > >> > > > > >
