Andreas Pieber wrote:
First of all, my name is Andreas and I'm one of the students working on the StAX implementation for cocoon. Therfore hello from my colleagues and me.

Hi Andreas and colleagues!

Secondly me first post ever to the mailing list of an open source project and such a long post to answer. Thank you Sylvain ;) Nevertheless I'm going to try my best.

Doh, sorry for that. But at least this brought some material for the discussion :-P

We (if i say we, I mean us students strongly influenced by Reinhard and Steven :)) also thought about the problems described by you and came to the same conclusion.

Good to hear!

Therefore we're trying another approach. Pulling StAX-XmlEvents through the entire pipeline from the end.
In other words, if we have a simple pipe of the following form:

Producer - Transformer - Serializer

the Serializer would have in its start method some code like:

while(parent.hasNext()){
        xmlOutputWriter.add(parent.getNext());
}

retrieving the next event on the Transformer in this case and writing it into an XmlOutputWriter. The transformer on his self calls the getNext method on the Starter (in this case) which retrieves the XmlEvents directly from the XmlInputReader.

In this approach the Transformer needs (of course) some kind of buffer since in response to one sibling from the parent much new content could be produced by the transformer. This content is only retrieved one by one while the next pipeline component calls getNext which explains the need for some kind of buffer.

Of course this buffer and some more helper code have to be produced to avoid code duplication and helping the developer.

I thought about that approach as well, but it doesn't avoid state management, which is the main complexity that Stax is supposed to solve. This is still a callback-based processing, although we have here pull callbacks rather than push callbacks.

Now you're right: a single pull callback can consume several input events that are related, making it thus easy to process a subtree of several closely related elements from the input. It would for exemple radically simplify the implementation of the I18nTransformer where <i18n:translate> and <i18n:choose> have a nested structure.

But in many situations the elements of interest to a transformer enclose large document sections that are to be propagated without modification. Examples are JXTemplateTransformer or FormsTransformer (but does anybody still use these instead of their generator replacements?), RoleFilterTransfomer, SQLTransformer, LuceneIndexTransformer, MailTransformer, etc.

In that case, if we want to avoid processing the full input when reacting to a start element in order to keep the benefits of streaming, we have to use state management very similar to what would be needed for a SAX implementation.

I also have the feeling that because of the need for state management, we'll end up with quite complex structures, because of the mix of a callback and state automata approach with the pull approach where state is kept in the method calls stack and local variables.

Now I'd love to be proven wrong, since after considering these issues I've never actually experimented with this approach.

One big "problem" in this approach is that the "flow direction of events" is completely inverted. This means that StAX and SAX components would not be able to work "directly" together. But also in a push-pull approach a conversion between StAX and SAX events have to be done and further more this problem could be tackled by writing a wrapper or adapters around the SAX components and add them to an StAX pipe.

Absolutely. Converting Stax to SAX is fairly trivial, but the other way around requires buffering or multithreading. Have you looked at Stax-Utils [1]? It contains many classes to ease the SAX <-> Stax translation.

At the moment we're developing a prototype for such a "pull only pipe" to get some experience with it.

Even if I may seem a big negative above, keep up on this work. As I said, I haven't actually experimented Stax-based state management, so maybe my feelings were wrong and I'm very interested in seeing what you can come up with.

Now there's one very interesting use case for Stax we should not forget: communication with remote APIs in a xmlrpc-style where the response body contains both status information useful to a controller, and actual data that can be used by a pipeline. In that case, the application controller should be able to pull a few events from the request until it has all the necessary information to decide what to do next, and then replay the full request event stream into a pipeline.

A typical example is the Flickr "REST" response [2], which BTW is actually not REST at all since the status code is in the response body rather than in the HTTP status. A typical controller for this API would be:

 InputStream flickrResponse = callFlickerAPI("foo");
 PushBackStreamReader input = new PushBackStreamReader(flickrResponse);
 input.nextTag();
 if ("ok".equals(in.getAttributeValue(null, "status")) {
     // go back to the first event in the stream
     input.reset();
     Pipeline pipe = new Pipeline();
     pipe.setGenerator(input);
     ... build the pipeline and run it ...
 } else {
     sendErrorResponse("Flickr failed");
 }

(note that in "pipe.setGenerator(input)" I don't care if the pipeline is Stax-based or SAX-based with a Stax to SAX converter)

I hope i was able to point out the nub of our thoughts. So, what do you think?

Yes, you got it! And sorry for throwing at you a large email for your first participation :-)

But you'll quickly learn that cocoon-dev is friendly place where everybody can voice his opinions... and have them challenged :-P

Sylvain

[1] http://stax-utils.dev.java.net/
[2] http://www.flickr.com/services/api/response.rest.html

--
Sylvain Wallez - http://bluxte.net

Reply via email to