Re: Introduction to icXML

John Snelson Mon, 03 Sep 2012 08:14:38 -0700

Hi Rob,

I've been following the development of Parabix for a number of years,and I'm excited that you're considering releasing it under the ApacheLicence.

I contribute to an XQuery library called XQilla that is based on top ofXerces-C. It would be interesting to see if I could get XQilla to buildon top of icXML as well - although XQilla makes heavy use of Xerces-Cinternal classes to get access to schema validation information.

It's great that you've benchmarked icXML at as much as twice as fast asXerces-C. However I would think you could get a lot better performanceout of the Parabix engine based on performance numbers you've publishedin the past, and experience with the Xerces-C code base.

In the past I've written an XML parser that was between 2x and 4x asfast as Xerces-C when I used it inside XQilla. Are you still working oncode speed ups, or do you have an idea where efforts need to be focusedto still improve parsing speed?


John

On 01/09/12 16:51, Rob Cameron wrote:

To: Xerces-C Developers List

International Characters, Inc. has been developing a high-performance
XML parser based on the systematic restructuring of Xerces-C++
to incorporate Parabix (parallel bit stream) technology.    Called icXML,
we are now preparing to release this parser under the Apache License
in the hope that it will be ultimately accepted as a Xerces subproject
(with our continuing participation).

The performance improvements offered by icXML are dramatic.   Our
target is a 50% speed-up compared to Apache Xerces C++, although
we are measuring more than 100% speed-up (twice as fast) in some
applications.

Parabix technology is the result of an ongoing research program
at Simon Fraser University where I am professor of Computing Science.
It takes advantage of the SIMD capabilities of modern processors
and a novel transposition of character streams into parallel bit streams
to process up to 256 characters at a time.   icXML is based on the
second generation Parabix technology as described in our papers
appearing the proceedings of EuroPar 2011 and HPCA 2012.

At present, our working stable version is icXML 0.6, and we are
targeting icXML 0.7 which should be close to functionally complete for
UTF-8 and UTF-16 inputs and the IGXML scanner.   When a few
bugs are resolved, we hope to be able to package it up for public
access on an SVN server.

On thing that is not quite clear to me, though, is the best organization
for keeping our code in a common framework with existing Xerces
code.    We presently have some source subdirectories for our own
newly created files, while we have also made edits, both major and
minor, to many other Xerces source files in place.    Is there any way
that the autotools chain can be used to address these issues?
Any advice on structuring would be highly appreciated.

Parabix and icXML are trademarks of International Characters, Inc.

Robert D. Cameron, Ph.D.
CTO, International Characters, Inc.

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org

Re: Introduction to icXML

Reply via email to