Re: [C3] StAX research reveiled!

Sylvain Wallez Sat, 27 Dec 2008 01:36:45 -0800

Michael Seydl wrote:

Hi all!
One more mail for the student group! Behind this lurid topic hides ourevaluation of the latest XML processing technologies regarding theirusability in Cocoon3 (especially if there are suited to be used in astreaming pipeline).As it's commonly know we decided to use StAX as our weapon of choiceto do the XML, but this paper should explain the whys and hows andespecially the way we took to come to our decision, which resulted inusing the very same API.Eleven pages should be a to big read and it contains all necessarylinks to all the APIs we evaluated and also line wise our two centsabout the API we observed. Concludingly we also tried to show thedifference between the currently used SAX and the of us proposed StAXAPI.
I hope this work sheds some light on our decision making and takingand that someone dares to read it.
That's from me, I wish you all a pleasant and very merry Christmas!

Regards,
Michael Seydl


Good work and interesting read, but don't agree with some of its statements!

The big if/else or switch statements mentioned as a drawback of thecursor API (XMLStreamReader) in 1.2.4 also apply to the event API, sinceit provides abstract events whose type needs also to be inspected todecide what to do.

The drawbacks of the stream API compared to the event API are, as youmention, that some methods of XMLStreamReader will throw an exceptiondepending on the current event's type and that the event is notrepresented as a data structure that can be passed directly to the nextelement in the pipeline or stored in an event buffer.

The first point (exceptions) should not happen, unless the code is buggyand tries to get information that doesn't belong to the context. I haveused many times the cursor API and haven't found any usability problemswith it.

The second point (lack of data structure) can be easily solved by usingan XMLEventAllocator [1] that creates an XMLEvent from the current stateof an XMLStreamReader.

The event API has the major drawback of always creating a new object forevery event (since as the javadoc says "events may be cached andreferenced after the parse has completed"). This can lead to a bigstrain on the memory system and garbage collection on a busy application.

So the cursor API is the most efficient IMO when it comes to consumingdata, since it doesn't require creating useless event objects.

Now in a pipeline context, we will want to transmit events untouchedfrom one component to the next one, using some partial buffering asmentioned in earlier discussions. A FIFO of XMLEvent object seems to bethe natural solution for this, but would require the use of events atthe pipeline API level, with their associated costs mentioned above.

So what should be used for pipelines ? My impression is that we shouldstick to the most efficient API and build the simple tools needed tobuffer events from a StreamReader, taking inspiration from theXMLBytestreamCompiler we already have.


Sylvain

[1]https://stax-utils.dev.java.net/nonav/javadoc/api/javax/xml/stream/util/XMLEventAllocator.html


--
Sylvain Wallez - http://bluxte.net

Re: [C3] StAX research reveiled!

Reply via email to