Andreas Pieber wrote:
First of all, my name is Andreas and I'm one of the students working on the StAX
implementation for cocoon. Therfore hello from my colleagues and me.
Hi Andreas and colleagues!
Secondly me first post ever to the mailing list of an open source project and
such a long post to answer. Thank you Sylvain ;) Nevertheless I'm going to try
my best.
Doh, sorry for that. But at least this brought some material for the
discussion :-P
We (if i say we, I mean us students strongly influenced by Reinhard and Steven
:)) also thought about the problems described by you and came to the same
conclusion.
Good to hear!
Therefore we're trying another approach. Pulling StAX-XmlEvents
through the entire pipeline from the end.
In other words, if we have a simple pipe of the following form:
Producer - Transformer - Serializer
the Serializer would have in its start method some code like:
while(parent.hasNext()){
xmlOutputWriter.add(parent.getNext());
}
retrieving the next event on the Transformer in this case and writing it into an
XmlOutputWriter. The transformer on his self calls the getNext method on the
Starter (in this case) which retrieves the XmlEvents directly from the
XmlInputReader.
In this approach the Transformer needs (of course) some kind of buffer since in
response to one sibling from the parent much new content could be produced by
the transformer. This content is only retrieved one by one while the next
pipeline component calls getNext which explains the need for some kind of
buffer.
Of course this buffer and some more helper code have to be produced to avoid
code duplication and helping the developer.
I thought about that approach as well, but it doesn't avoid state
management, which is the main complexity that Stax is supposed to solve.
This is still a callback-based processing, although we have here pull
callbacks rather than push callbacks.
Now you're right: a single pull callback can consume several input
events that are related, making it thus easy to process a subtree of
several closely related elements from the input. It would for exemple
radically simplify the implementation of the I18nTransformer where
<i18n:translate> and <i18n:choose> have a nested structure.
But in many situations the elements of interest to a transformer enclose
large document sections that are to be propagated without modification.
Examples are JXTemplateTransformer or FormsTransformer (but does anybody
still use these instead of their generator replacements?),
RoleFilterTransfomer, SQLTransformer, LuceneIndexTransformer,
MailTransformer, etc.
In that case, if we want to avoid processing the full input when
reacting to a start element in order to keep the benefits of streaming,
we have to use state management very similar to what would be needed for
a SAX implementation.
I also have the feeling that because of the need for state management,
we'll end up with quite complex structures, because of the mix of a
callback and state automata approach with the pull approach where state
is kept in the method calls stack and local variables.
Now I'd love to be proven wrong, since after considering these issues
I've never actually experimented with this approach.
One big "problem" in this approach is that the "flow direction of events" is
completely inverted. This means that StAX and SAX components would not be able
to work "directly" together. But also in a push-pull approach a conversion
between StAX and SAX events have to be done and further more this problem could
be tackled by writing a wrapper or adapters around the SAX components and add
them to an StAX pipe.
Absolutely. Converting Stax to SAX is fairly trivial, but the other way
around requires buffering or multithreading. Have you looked at
Stax-Utils [1]? It contains many classes to ease the SAX <-> Stax
translation.
At the moment we're developing a prototype for such a "pull only pipe" to get
some experience with it.
Even if I may seem a big negative above, keep up on this work. As I
said, I haven't actually experimented Stax-based state management, so
maybe my feelings were wrong and I'm very interested in seeing what you
can come up with.
Now there's one very interesting use case for Stax we should not forget:
communication with remote APIs in a xmlrpc-style where the response body
contains both status information useful to a controller, and actual data
that can be used by a pipeline. In that case, the application controller
should be able to pull a few events from the request until it has all
the necessary information to decide what to do next, and then replay the
full request event stream into a pipeline.
A typical example is the Flickr "REST" response [2], which BTW is
actually not REST at all since the status code is in the response body
rather than in the HTTP status. A typical controller for this API would be:
InputStream flickrResponse = callFlickerAPI("foo");
PushBackStreamReader input = new PushBackStreamReader(flickrResponse);
input.nextTag();
if ("ok".equals(in.getAttributeValue(null, "status")) {
// go back to the first event in the stream
input.reset();
Pipeline pipe = new Pipeline();
pipe.setGenerator(input);
... build the pipeline and run it ...
} else {
sendErrorResponse("Flickr failed");
}
(note that in "pipe.setGenerator(input)" I don't care if the pipeline is
Stax-based or SAX-based with a Stax to SAX converter)
I hope i was able to point out the nub of our thoughts. So, what do you think?
Yes, you got it! And sorry for throwing at you a large email for your
first participation :-)
But you'll quickly learn that cocoon-dev is friendly place where
everybody can voice his opinions... and have them challenged :-P
Sylvain
[1] http://stax-utils.dev.java.net/
[2] http://www.flickr.com/services/api/response.rest.html
--
Sylvain Wallez - http://bluxte.net