Re: [C3] StAX research reveiled!

Sylvain Wallez Sun, 28 Dec 2008 12:47:53 -0800

Steven Dolg wrote:

Sylvain Wallez schrieb:
<snip/>
Steven Dolg wrote:
Basically you're providing a buffer between every pair of componentsand fill it as needed.
Yes. Now this buffer will always contain a very limited number ofevents, corresponding to the result of processing an amount of inputdata that is convenient to process at once to avoid complex statemanagement (e.g. an <i18:text> tag with all its children). And somost often, this buffer will contain just one event.
Think of it as being just a bridge between the writer view used by aproducer and the reader view used by its consumer. These are in myopinion the most convenient views to write StAX components.
But you need to implement both XMLStreamWriter and XMLStreamReaderand optimize that for any possible thing a transformer might do.In order to buffer all the data from the components you will have tocreate some objects as well - I guess you will end up with somethinglike the XMLEvent and maintaining a list of them in the StaxFIFO.That's why I think an efficient (as in faster than the Event API)implementation of the StaxFIFO is difficult to make.
It's certainly less trivial than maitaining a list of events, butshould be doable quite efficiently by using an int FIFO (to storeevent types and attribute counts) and a String FIFO (for everythingelse). I'll try find a couple of hours to prototype this.
On the other hand I do think that the cursor API is quite a bitharder to use.As stated in the Javadoc of XMLStreamReader it is the lowest levelfor reading XML data - which usually means more logic in the codeusing the API and more knowledge in the head of the developerreading/writing the code is required.So I second Andreas' statement that we will sacrifice simplicity for(a small amount of ?) performance.
I understand your point, even if I don't totally agree :-) Now itshould be mentioned that if even with events, my proposal stillstands: just replace XMLStream{Reader|Writer} withXMLEvent{Reader|Writer}.
The other thing is that - at least the way you suggested - we wouldneed a special implementation of the Pipeline interface.That is something that compromises the intention behind having aPipeline API.Right now we can use the new StAX components and simply put theminto any of the Pipeline implementations we already have.
Sacrificing this is completely out of the question IMO.
Actually, I'm wondering if wanting a single API is not wishfulthinking and will in the end lead to something that is overlyabstract and hence difficult to understand and use, or whereunderlying implementations will leak in the high-level abstraction.
There is already some impedence mismatch appearing between pull andpush in the code:- a StAXGenerator has to call initiatePullProcessing() on itsconsumer, which in turn will have to call it on it's own consumer,etc until we reach the Finisher that will finally start pullingevents. This moves a responsibility that belongs to the pipeline downto its components.
Well I don't see the problem with that.
From the pipeline's point of view those are normal components justlike all the other.The pipeline was never intended to "care" about the internals of thecomponents - so why bothering that the StAXGenerator calls"initiatePullProcessing" on its consumer instead of calling some othermethod like e.g. "startDocument".

Hmm... the fact that every implementation has to copy/paste the exactsame call to initiatePullProcessing (or has extend a common abstractclass that does it) because the pipeline expects processing to bestarted on the first component is a sign of a design problem to me.

Some responsibilities of the pipeline creep into its components becausethe pipeline is too abstract, or because there's an intermediateadaptation layer that's missing.

- an AbstractStAXProducer only accepts a StAXConsumer, defeating theidea of a unified pipeline implementation that will accept everything.
The idea was to have pipelines being capable of processing virtuallyany data.But that is not the same as combining components in an arbitrary way,e.g. there is no sense in linking a FileGenerator with an (not yetexisting) ImageTransformer based on Java's Imaging API.
The components must be "compatible" - that is they must understand thedata they exchange with each other.We may however provide some adapters/converters to make certain"types" of components compatible, e.g. SAX <--> StAX.
So we should either have several APIs specifically tailored to theunderlying push or pull model, or make sure the unified API and itsimplementations accept any kind of component and set the appropriateconversion bridges between them.
As I tried to state above: that will not be possible for everyconceivable combination of components.
At least not when thinking beyond XML - which I do.

Again I doubt of the real value of a common unified pipeline if all theresponsibility of ensuring proper compatibility between components(including possible data conversion) is delegated to components. Thisleaves a lot of complexity to component implementers (except in thesimple straightforward push scenario), and the features of the pipelinewill be limited to linking the components together and caching.

Furthermore, people will have to take great care of choosing componentsthat fit together, or they will get exceptions at pipeline executiontime. Hmm... reminds me of some criticism about the StAX stream API :-D

So let's agree that we disagree. I'll see what you guys come up with andhope I'll change my mind then.


Sylvain

--
Sylvain Wallez - http://bluxte.net

Re: [C3] StAX research reveiled!

Reply via email to