Stefano Mazzocchi wrote:
> Hmmm, maybe deep architectural discussions are good during holydays
> seasons... we'll see :)
Not for me, I've been away from computers for a while. But you and
Nicola Ken seem to have had an interesting discussion :)
The discussion about input pipelines can be divided in two parts:
1. Improving the handling of the input stream in Cocoon. This is needed
for web services, it is also needed for making it possible to implement
a writable cocoon:-protocol, something that IMO would be very useful for
reusing functionality in Cocoon, especially from blocks.
2. The second part of the proposal is to use two pipelines, executed in
sequence, to respond to input in Cocoon. The first pipeline (called
input pipeline) is responsible for reading the input and from request
parameters or from the input stream, transform it to an appropriate
format and store it in e.g. a session parameter, a file or a db. After
the input pipeline there is an ordinary (output) pipeline that is
responsible for generating the response. The output pipeline is executed
after that the execution of the input pipeline is completed, as a
consequence actions and selections in the output pipeline can be
dependent e.g. on if the handling of input succeeded or not and on the
data that was stored by the input pipeline.
Here I will focus on your comments on the second part of the proposal.
> Daniel Fagerstrom wrote:
<snip/>
>> In Sitemaps
>> -----------
>>
>> In a sitemap an input pipeline could be used e.g. for implementing a
>> web service:
>>
>> <match pattern="myservice">
>> <generate type="xml">
>> <parameter name="scheme" value="myInputFormat.scm"/>
>> </generate>
>> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>> <serialize type="dom-session" non-terminating="true">
>> <parameter name="dom-name" value="input"/>
>> </serialize>
>> <select type="pipeline-state">
>> <when test="success">
>> <act type="my-business-logic"/>
>> <generate type="xsp" src="collectTheResult.xsp"/>
>> <serialize type="xml"/>
>> </when>
>> <when test="non-valid">
>> <!-- produce an error document -->
>> </when>
>> </select>
>> </match>
>>
>> Here we have first an input pipeline that reads and validates xml
>> input, transforms it to some appropriate format and store the result
>> as a dom-tree in a session attribute. A serializer normally means that
>> the pipeline should be executed and thereafter an exit from the
>> sitemap. I used the attribute non-terminating="true", to mark that
>> the input pipeline should be executed but that there is more to do in
>> the sitemap afterwards.
>>
>> After the input pipeline there is a selector that select the output
>> pipeline depending of if the input pipeline succeed or not. This use
>> of selection have some relation to the discussion about pipe-aware
>> selection (see [3] and the references therein). It would solve at
>> least my main use cases for pipe-aware selection, without having its
>> drawbacks: Stefano considered pipe-aware selection mix of concern,
>> selection should be based on meta data (pipeline state) rather than on
>> data (pipeline content). There were also some people who didn't like
>> my use of buffering of all input to the pipe-aware selector. IMO the
>> use of selectors above solves booth of these issues.
>>
>> The output pipeline start with an action that takes care about the
>> business logic for the application. This is IMHO a more legitimate use
>> for actions than the current mix of input handling and business logic.
>
>
> Wouldn't the following pipeline achieve the same functionality you want
> without requiring changes to the architecture?
>
> <match pattern="myservice">
> <generate type="payload"/>
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
> <select type="pipeline-state">
> <when test="valid">
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> <transform type="my-business-logic"/>
> <serialize type="xml"/>
> </when>
> <otherwise>
> <!-- produce an error document -->
> </otherwise>
> </select>
> </match>
Yes, it would achieve about the same functionality as I want and it
could easily be implemented with the help of the small extensions of the
sitemap interpreter that I implemented for pipe aware selection [3].
I think it could be interesting to do a detailed comparison between the
differences in our proposals: How the input stream and validation is
handled, how the selection based on pipeline state is performed, if
storage of the input is done in a serializer or in a transformer, and
how the new output is created.
Input Stream
------------
For input stream handling you used
<generate type="payload"/>
Is the payload generator equivalent to the StreamGenerator? Or does it
something more, like switching parser depending on mime type for the
input stream?
I used
<generate type="xml"/>
The idea is that if no src attribute is given the sitemap interpreter
automatically connect the generator to the input stream of the
environment (the input stream from the http request in the servlet case,
in other cases it is more unclear). This behavior was inspired by the
handling of std input in unix pipelines.
Nicola Ken proposed:
<generate type="xml" src="inputstream://"/>
I prefer this solution compared to mine as it doesn't require any change
of the sitemap interpreter, I also believe that it it easier to
understand as it is more explicit. It also (as Nicola Ken has explained)
gives a good SoC, the uri in the src attribute describes where to read
the resource from, e.g. input stream, file, cvs, http, ftp, etc and the
generator is responsible for how to parse the resource. If we develop a
input stream protocol, all the work invested in the existing generators,
can immediately be reused in web services.
Validation
----------
Should validation be part of the parsing of input as in:
<generate type="xml">
<parameter name="scheme" value="myInputFormat.scm"/>
</generate>
or should it be a separate transformation step:
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
or maybe the responsibility of the protocol as Nicola Ken proposed in
one of his posts:
<generate type="xml" src="inputstream:myInputFormat.scm"/>
This is not a question about architecture but rather one about finding
"best practices".
I don't think validation should be part of the protocol. It means that
the protocol has to take care of the parsing and that would mumble the
SoC where the protocol is responsible for locating and delivering the
stream and the generator is responsible for parsing it, that Nicola Ken
have argued for in his other posts.
Should validation be part of the generator or a transform step? I don't
know. If the input not is xml as for the ParserGenerator, I guess that
the validation must take place in the generator. If the xml parser
validates the input as a part of the parsing it is more practical to let
the generator be responsible for validation (IIRC Xerces2 has an
internal pipeline structure and performs validation in a transformer
like way, so for Xerces2 it would probably be as efficient to do
validation in a transformer as in a generator). Otherwise it seem to
give better SoC to separate the parsing and the validation step, so that
we can have one validation transformer for each scheme language.
In some cases it might be practical to augment the xml document with
error information to be able to give more exact user feedback on where
the errors are located. For such applications it seem more natural to me
to have validation in a transformer.
A question that might have architectural consequences is how the
validation step should report validation errors. If the input is not
parseable at all there is not much more to do than throwing an exception
and letting the ordinary internal error handler report the situation. If
some of the elements or attributes in the input has the wrong type we
probably want to return more detailed feedback than just the internal
error page. Some possible validation error report mechanisms are:
storing an error report object in the environment e.g. in the object
model, augmenting the xml document with error reporting attributes or
elements, throwing an exception object that contains a detailed error
description object or a combination of some of these mechanisms.
Mixing data and state information was considered to be a bad practice in
the discussion about pipe-aware selection (se references in [3]), that
rules out using only augmentation of the xml document as error reporting
mechanism. Throwing an exeption would AFAIU lead to difficulties in
giving customized error reports. So I believe it would be best to put
some kind of state describing object in the environment and possibly
combine this whith augmentation of the xml document.
Pipe State Dependent Selection
------------------------------
For selecting response based on if the input document is valid or not
you suggest the following:
...
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
<select type="pipeline-state">
<when test="valid">
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
...
As I mentioned earlier this could easily be implemented with the
"pipe-aware selection" code I submitted in [3]. Let us see how it would
work:
The PipelineStateSelector can not be executed at pipeline construction
time as for ordinary selectors. The pipeline before the selector
including the ValidatorTransformer must have been executed before the
selection is performed. This can be implemented by letting the
PipelineStateSelector implement a special marker interface, say
PipelineStateAware, so that it can have special treatment in the
selection part of the sitemap interpreter.
When the sitemap interpreter gets a PipelineStateAware selector it first
ends the currently constructed pipeline with a serializer that store its
sax input in e.g. a dom-tree and the pipeline is processed and the dom
tree thith the cashed result is stored in e.g. the object model. In the
next step the selector is executed and it can base its decision on
result from the first part of the pipeline. If the ValidationTransformer
puts a validation result descriptor in the object model, the
PipelineStateSelector can perform tests on this result descriptor. In
the last step a new pipeline is constructed where the generator reads
from the stored dom tree, and in the example above, the first
transformer will be an XSLTransformer.
An alternative and more explicit way to describe the pipeline state
dependent selection above, is:
...
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
<serialize type="object-model-dom" non-terminating="true">
<parameter name="name" value="validated-input"/>
</serialize>
<select type="pipeline-state">
<when test="valid">
<generate type="object-model-dom">
<parameter name="name" value="validated-input"/>
</generate>
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
...
Here the extensions to the current Cocoon semantics is put in the
serializer instead of the selector. The sitemap interpreter treats a
non-terminating serializer as ordinary serializer in the sense that it
puts the serializer in the end of the current pipeline and executes it.
The difference is that it instead of returning to the caller of the
sitemap interpreter, it creates a new current pipeline and continue to
interpret the component after the serializer, in this case a selector.
The sitemap interpreter will also ignore the output stream of the
serializer, the serializer is suposed to have side effects. The new
current pipeline will then get a ObjectModelDOMGenerator as generator
and an XSLTTransformer as its first transformer.
I prefer this construction compared to the more implicit one because it
is more obvious what it does and also as it gives more freedom about how
to store the user input. Some people seem to prefer to store user input
in Java beans, in some applications session parameters might be a better
place then the object model.
Pipelines with Side Effects
---------------------------
A common pattern in pipelines that handle input (at least in the
application that I write) is that the first half of the pipeline takes
care of the input and ends with a transformer that stores the input. The
transformer can be e.g. the SQLTransformer (with insert or update
statements), the WriteDOMSessionTransformer, the
SourceWritingTransformer. These transformers has side effects, they
store something, and returns an xml document that tells if it succeeded
or not. A conclusion from the threads about pipe aware selection was
that sending meta data, like if the operation succeeded or not, in the
pipeline is a bad practice and especially that we don't should allow
selection based on such content. Given that these transformers basically
translate xml input to a binary format and generates an xml output that
we are supposed to ignore, it would IMO be more natural to see them as
some kind of serializer.
The next half of the pipeline creates the response, here it is less
obvious what transformer to use. I normally use an XSLTTransformer and
typically ignore its input stream and only create an xml document that
is rendered into e.g. html in a sub sequent transformer.
I think that it would be more natural to replace the pattern:
...
<transform type="store something, return state info"/>
<transform type="create a response document, ignore input"/>
...
with
...
<serialize type="store something, put state info in the environment"
non-terminating="true"/>
<generate type="create a response document" src="response document"/>
...
If we give the serializer a destination attribute as well, all the
existing serializers could be used for storing input in files etc as well.
...
<serialize type="xml" dest="xmldb://..." non-terminating="true"/>
...
This would give the same SoC that i argued in favour of in the context
of input: The serializer is responsible for how to serialize from xml to
the binary data format and the destination is responsible for where to
store the data.
Conclusion
----------
I am afraid that I put more question than I answer in this RT. Many of
them are of "best practice" character, and do not have any architectural
consequences, and does not have to be answered right now. There are
however some questions that need an answer:
How should pipeline components, like the validation transformer, report
state information? Placing some kind of state object in the object model
would be one possibility, but I don't know.
We seem to agree about that there is a need for selection in pipelines
based on the state of the computation in the pipeline that precedes the
selection. Here we have two proposals:
1. Introduce pipeline state aware selectors (e.g. by letting the
selector implement a marker interface), and give such selectors special
treatment in the sitemap interpreter.
2. Extend the semantics of serializers so that the sitemap interpreter
can continue to interpret the sitemap after a serializer, (e.g. by a new
non-terminating attribute for serializers).
I prefer the second proposal.
Booth proposals can be implemented with no back compatibility problems
at all by requiring the selectors or serializer that need the extended
semantics, to implement a special marker interface, and by adding code
that reacts on the marker interface in the sitemap interpreter.
To use serializers more generally for storing things, as I propsed
above, the Serializer interface would need to extend the
SitemapModelComponent interface.
------
What do you think?
Daniel Fagerstrom
<snip/>
[3] [Contribution] Pipe-aware selection
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]
- [RT] Input Pipelines (long) Daniel Fagerstrom
- Re: [RT] Input Pipelines (long) Nicola Ken Barozzi
- Re: [RT] Input Pipelines (long) Daniel Fagerstrom
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi
- Re: [RT] Input Pipelines (long) Nicola Ken Barozzi
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi
- Re: [RT] Input Pipelines (long) Nicola Ken Barozzi
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi
- Re: [RT] Input Pipelines (lon... Nicola Ken Barozzi
- Re: [RT] Input Pipelines: Storage and Selectio... Daniel Fagerstrom
- Re: [RT] Input Pipelines: Storage and Sele... Stefano Mazzocchi
- RE: [RT] Input Pipelines (long) Hunsberger, Peter
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi
- RE: [RT] Input Pipelines (long) Michael Homeijer
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi
- RE: [RT] Input Pipelines (long) Michael Homeijer
- Re: [RT] Input Pipelines (long) Stefano Mazzocchi