Michael Seydl wrote:
Hi all!
One more mail for the student group! Behind this lurid topic hides our
evaluation of the latest XML processing technologies regarding their
usability in Cocoon3 (especially if there are suited to be used in a
streaming pipeline).
As it's commonly know we decided to use StAX as our weapon of choice
to do the XML, but this paper should explain the whys and hows and
especially the way we took to come to our decision, which resulted in
using the very same API.
Eleven pages should be a to big read and it contains all necessary
links to all the APIs we evaluated and also line wise our two cents
about the API we observed. Concludingly we also tried to show the
difference between the currently used SAX and the of us proposed StAX
API.
I hope this work sheds some light on our decision making and taking
and that someone dares to read it.
That's from me, I wish you all a pleasant and very merry Christmas!
Regards,
Michael Seydl
Good work and interesting read, but don't agree with some of its statements!
The big if/else or switch statements mentioned as a drawback of the
cursor API (XMLStreamReader) in 1.2.4 also apply to the event API, since
it provides abstract events whose type needs also to be inspected to
decide what to do.
The drawbacks of the stream API compared to the event API are, as you
mention, that some methods of XMLStreamReader will throw an exception
depending on the current event's type and that the event is not
represented as a data structure that can be passed directly to the next
element in the pipeline or stored in an event buffer.
The first point (exceptions) should not happen, unless the code is buggy
and tries to get information that doesn't belong to the context. I have
used many times the cursor API and haven't found any usability problems
with it.
The second point (lack of data structure) can be easily solved by using
an XMLEventAllocator [1] that creates an XMLEvent from the current state
of an XMLStreamReader.
The event API has the major drawback of always creating a new object for
every event (since as the javadoc says "events may be cached and
referenced after the parse has completed"). This can lead to a big
strain on the memory system and garbage collection on a busy application.
So the cursor API is the most efficient IMO when it comes to consuming
data, since it doesn't require creating useless event objects.
Now in a pipeline context, we will want to transmit events untouched
from one component to the next one, using some partial buffering as
mentioned in earlier discussions. A FIFO of XMLEvent object seems to be
the natural solution for this, but would require the use of events at
the pipeline API level, with their associated costs mentioned above.
So what should be used for pipelines ? My impression is that we should
stick to the most efficient API and build the simple tools needed to
buffer events from a StreamReader, taking inspiration from the
XMLBytestreamCompiler we already have.
Sylvain
[1]
https://stax-utils.dev.java.net/nonav/javadoc/api/javax/xml/stream/util/XMLEventAllocator.html
--
Sylvain Wallez - http://bluxte.net