Re: [cocoon3] Stax Pipelines

Sylvain Wallez Tue, 02 Dec 2008 23:57:28 -0800

Andreas Pieber wrote:

First of all, my name is Andreas and I'm one of the students working on the StAXimplementation for cocoon. Therfore hello from my colleagues and me.


Hi Andreas and colleagues!

Secondly me first post ever to the mailing list of an open source project andsuch a long post to answer. Thank you Sylvain ;) Nevertheless I'm going to trymy best.

Doh, sorry for that. But at least this brought some material for thediscussion :-P

We (if i say we, I mean us students strongly influenced by Reinhard and Steven:)) also thought about the problems described by you and came to the sameconclusion.


Good to hear!

Therefore we're trying another approach. Pulling StAX-XmlEventsthrough the entire pipeline from the end.
In other words, if we have a simple pipe of the following form:

Producer - Transformer - Serializer

the Serializer would have in its start method some code like:

while(parent.hasNext()){
        xmlOutputWriter.add(parent.getNext());
}
retrieving the next event on the Transformer in this case and writing it into anXmlOutputWriter. The transformer on his self calls the getNext method on theStarter (in this case) which retrieves the XmlEvents directly from theXmlInputReader.
In this approach the Transformer needs (of course) some kind of buffer since inresponse to one sibling from the parent much new content could be produced bythe transformer. This content is only retrieved one by one while the nextpipeline component calls getNext which explains the need for some kind ofbuffer.
Of course this buffer and some more helper code have to be produced to avoidcode duplication and helping the developer.

I thought about that approach as well, but it doesn't avoid statemanagement, which is the main complexity that Stax is supposed to solve.This is still a callback-based processing, although we have here pullcallbacks rather than push callbacks.

Now you're right: a single pull callback can consume several inputevents that are related, making it thus easy to process a subtree ofseveral closely related elements from the input. It would for exempleradically simplify the implementation of the I18nTransformer where<i18n:translate> and <i18n:choose> have a nested structure.

But in many situations the elements of interest to a transformer encloselarge document sections that are to be propagated without modification.Examples are JXTemplateTransformer or FormsTransformer (but does anybodystill use these instead of their generator replacements?),RoleFilterTransfomer, SQLTransformer, LuceneIndexTransformer,MailTransformer, etc.

In that case, if we want to avoid processing the full input whenreacting to a start element in order to keep the benefits of streaming,we have to use state management very similar to what would be needed fora SAX implementation.

I also have the feeling that because of the need for state management,we'll end up with quite complex structures, because of the mix of acallback and state automata approach with the pull approach where stateis kept in the method calls stack and local variables.

Now I'd love to be proven wrong, since after considering these issuesI've never actually experimented with this approach.

One big "problem" in this approach is that the "flow direction of events" iscompletely inverted. This means that StAX and SAX components would not be ableto work "directly" together. But also in a push-pull approach a conversionbetween StAX and SAX events have to be done and further more this problem couldbe tackled by writing a wrapper or adapters around the SAX components and addthem to an StAX pipe.

Absolutely. Converting Stax to SAX is fairly trivial, but the other wayaround requires buffering or multithreading. Have you looked atStax-Utils [1]? It contains many classes to ease the SAX <-> Staxtranslation.

At the moment we're developing a prototype for such a "pull only pipe" to getsome experience with it.

Even if I may seem a big negative above, keep up on this work. As Isaid, I haven't actually experimented Stax-based state management, somaybe my feelings were wrong and I'm very interested in seeing what youcan come up with.

Now there's one very interesting use case for Stax we should not forget:communication with remote APIs in a xmlrpc-style where the response bodycontains both status information useful to a controller, and actual datathat can be used by a pipeline. In that case, the application controllershould be able to pull a few events from the request until it has allthe necessary information to decide what to do next, and then replay thefull request event stream into a pipeline.

A typical example is the Flickr "REST" response [2], which BTW isactually not REST at all since the status code is in the response bodyrather than in the HTTP status. A typical controller for this API would be:


 InputStream flickrResponse = callFlickerAPI("foo");
 PushBackStreamReader input = new PushBackStreamReader(flickrResponse);
 input.nextTag();
 if ("ok".equals(in.getAttributeValue(null, "status")) {
     // go back to the first event in the stream
     input.reset();
     Pipeline pipe = new Pipeline();
     pipe.setGenerator(input);
     ... build the pipeline and run it ...
 } else {
     sendErrorResponse("Flickr failed");
 }

(note that in "pipe.setGenerator(input)" I don't care if the pipeline isStax-based or SAX-based with a Stax to SAX converter)

I hope i was able to point out the nub of our thoughts. So, what do you think?

Yes, you got it! And sorry for throwing at you a large email for yourfirst participation :-)

But you'll quickly learn that cocoon-dev is friendly place whereeverybody can voice his opinions... and have them challenged :-P


Sylvain

[1] http://stax-utils.dev.java.net/
[2] http://www.flickr.com/services/api/response.rest.html

--
Sylvain Wallez - http://bluxte.net

Re: [cocoon3] Stax Pipelines

Reply via email to