Re: Introduction to icXML

Rob Cameron Mon, 03 Sep 2012 09:20:23 -0700

Hi, John.

Good questions and an interesting topic to investigate with respect
to XQilla on top of icXML.


Acceleration of existing software is challenging.    Amdahl's Laws come
into play.   If the cost of Xerces in an application is 50%, then we can speed
up the application by at most 2X, even if we could make the
Xerces parsing cost insignificant.    If the parsing cost is 75%,
and we can speed up parsing by (only!) 3X, then we are again limited
to a 50% end-to-end speed up for the application.

Fundamentally, the icXML project is limited by the need to replicate
the Xerces APIs.    We have previously looked at parsing cost alone
in tackling the issue of XML well-formedness as an application.
Without the need to duplicate APIs, we can report well-formedness
about up to an order of magnitude faster than Xerces can, depending
on markup density.

However, these are speedups on single core.    Our research now
also involves leveraging the SIMD parallelism of parabix to take
advantage of multicore.     In this case, if we can run the application
code on one core and the parsing code on another, there are
further potential benefits.    In the example of an application
with parsing cost of 75%, then a 3X speed up of parsing would
allow us to reduce the cost of parsing to 25% of the original
execution time.   With the application code running on one
core in parallel with the parsing code on another the potential
end-to-end speedup is 4X.   This is the kind of speed-up we
are targetting, but it is the ideal case with a perfect balance
between the cost of the application and the parser.




On Mon, Sep 3, 2012 at 8:13 AM, John Snelson <[email protected]> wrote:
> Hi Rob,
>
> I've been following the development of Parabix for a number of years, and
> I'm excited that you're considering releasing it under the Apache Licence.
>
> I contribute to an XQuery library called XQilla that is based on top of
> Xerces-C. It would be interesting to see if I could get XQilla to build on
> top of icXML as well - although XQilla makes heavy use of Xerces-C internal
> classes to get access to schema validation information.
>
> It's great that you've benchmarked icXML at as much as twice as fast as
> Xerces-C. However I would think you could get a lot better performance out
> of the Parabix engine based on performance numbers you've published in the
> past, and experience with the Xerces-C code base.
>
> In the past I've written an XML parser that was between 2x and 4x as fast as
> Xerces-C when I used it inside XQilla. Are you still working on code speed
> ups, or do you have an idea where efforts need to be focused to still
> improve parsing speed?
>
> John
>
>
> On 01/09/12 16:51, Rob Cameron wrote:
>>
>> To: Xerces-C Developers List
>>
>> International Characters, Inc. has been developing a high-performance
>> XML parser based on the systematic restructuring of Xerces-C++
>> to incorporate Parabix (parallel bit stream) technology.    Called icXML,
>> we are now preparing to release this parser under the Apache License
>> in the hope that it will be ultimately accepted as a Xerces subproject
>> (with our continuing participation).
>>
>> The performance improvements offered by icXML are dramatic.   Our
>> target is a 50% speed-up compared to Apache Xerces C++, although
>> we are measuring more than 100% speed-up (twice as fast) in some
>> applications.
>>
>> Parabix technology is the result of an ongoing research program
>> at Simon Fraser University where I am professor of Computing Science.
>> It takes advantage of the SIMD capabilities of modern processors
>> and a novel transposition of character streams into parallel bit streams
>> to process up to 256 characters at a time.   icXML is based on the
>> second generation Parabix technology as described in our papers
>> appearing the proceedings of EuroPar 2011 and HPCA 2012.
>>
>> At present, our working stable version is icXML 0.6, and we are
>> targeting icXML 0.7 which should be close to functionally complete for
>> UTF-8 and UTF-16 inputs and the IGXML scanner.   When a few
>> bugs are resolved, we hope to be able to package it up for public
>> access on an SVN server.
>>
>> On thing that is not quite clear to me, though, is the best organization
>> for keeping our code in a common framework with existing Xerces
>> code.    We presently have some source subdirectories for our own
>> newly created files, while we have also made edits, both major and
>> minor, to many other Xerces source files in place.    Is there any way
>> that the autotools chain can be used to address these issues?
>> Any advice on structuring would be highly appreciated.
>>
>> Parabix and icXML are trademarks of International Characters, Inc.
>>
>> Robert D. Cameron, Ph.D.
>> CTO, International Characters, Inc.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Introduction to icXML

Reply via email to