Re: [RT] Input Pipelines (long)

Daniel Fagerstrom Wed, 18 Dec 2002 15:28:42 -0800

Nicola Ken Barozzi wrote:

Daniel Fagerstrom wrote:
[...]

Cocoon is symmetric, if you see it as it really is, a system that transforms a Request in a Response.

The problem arises in the way we have defined the request and the response: The Request is a URL, the response is a Stream.

So actually Cocoon transforms URIs in a stream.

The sitemap is the system that demultiplexes URIs by associating them with actual source of the data. This makes cocoon richer than a system that just hands an entity to transform: Cocoon uses indirect references (URLs) instead.

The Stream as an input is a specialization, so I can say in the request to get stuff from the stream.

More on this later.
In a sitemap an input pipeline could be used e.g. for implementing a
web service:
<match pattern="myservice">
  <generate type="xml">
    <parameter name="scheme" value="myInputFormat.scm"/>
  </generate>
  <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
  <serialize type="dom-session" non-terminating="true">
    <parameter name="dom-name" value="input"/>
  </serialize>
  <select type="pipeline-state">
    <when test="success">
      <act type="my-business-logic"/>
      <generate type="xsp" src="collectTheResult.xsp"/>
      <serialize type="xml"/>
    </when>
    <when test="non-valid">
      
    </when>
  </select>
</match>
What you correctly point out, is that the Generation phase not get the source, but just transform it to SAX.

<snip/>

But IMHO this has a deficiency of fixing the source from the input.

My intension was that that when not using src attribute, the generator should read the input stream.

Think about having good Source Protocols.

We could write:

<match pattern="myservice">
<generate type="xml" src="inputstream:myInputFormat.scm"/>
...
</match>

This can easily make all my Generators able to work with the new system right away.

This seem to be a better solution. Can please expand about why you put the scheme in the inputstream: protocol.

Here we have first an input pipeline that reads and validates xml
input, transforms it to some appropriate format and store the result
as a dom-tree in a session attribute. A serializer normally means that
the pipeline should be executed and thereafter an exit from the
sitemap. I used the attribute non-terminating="true", to mark that
the input pipeline should be executed but that there is more to do in
the sitemap afterwards.

Pipelines can already call one another.
We add the serializer at the end, but it's basically skipped, thus making your pipeline example.

The idea is using two pipelines, executed in sequence, for processing a post. First the input pipeline that is responsible for reading the input data, trandform it to an appropriate format and store it, after that the stored data can be used for the business logic that can be called from an action, after the action an ordinary output pipeline is executed for publishing the result of the business logic, for sending the next form page etc.

In this scenario the serializer in the input pipeline is responsible for storing the input data and can thus not be skipped. Furthermore as we are going to execute two pipelines in sequence, the first serializer must not mean an exit from the sitemap as it normally would do.

I think it is better SoC and reuse of components, to let a serializer be responsible for storing input data than to use transformers for that. Write DOM session transformer, source writing transformer, SQLTransformer used for inserting data and the session transformer would IMHO be more natural as serializers.

I would think that with the blocks discussion there has been some advancement on the definition of pipeline fragments.
I didn't follow it closely though, anyone care to comment?

After the input pipeline there is a selector that select the output
pipeline depending of if the input pipeline succeed or not. This use
of selection have some relation to the discussion about pipe-aware
selection (see [3] and the references therein). It would solve at
least my main use cases for pipe-aware selection, without having its
drawbacks: Stefano considered pipe-aware selection mix of concern,
selection should be based on meta data (pipeline state) rather than on
data (pipeline content). There were also some people who didn't like
my use of buffering of all input to the pipe-aware selector. IMO the
use of selectors above solves booth of these issues.


I don't see this. Can you please expand here?

1. Selection should be based on pipeline state instead of pipeline data. First the input pipeline is executed and is able to set the state of the pipeline. After that ordinary selects can be used for deciding how to construct the output pipeline. The selectors for the output pipeline has no access some pipeline content and are used in exactly the same way as selector allwys are used.

2. No use of buffering within the pipeline. IIRC some people were concerned with that pipe aware selection based on buffering of the sax events before the selection, could be very inefficient if there is much data in the pipeline. As my main use case for pipe aware selection was to use it after transformers with side effects, and after validation of user submitted input data. I never saw it as problem as the amount of data in the mentioned cases typically is quite small. Anyway, with input pipelines selection is restricted to cases where the input was going to be stored by the system anyhow.

[...]

In Flowscripts
--------------

IIRC the discussion and examples of input for flowscripts this far has
mainly dealed with request parameter based input. If we want to use
flowscripts for describing e.g. web service flow, more advanced input
handling is needed. IMO it would be an excelent SOC to use output
pipelines for the presentation of the data used in the system, input
pipelines for going from input to system data, java objects (or some
other programming language) for describing business logic working on
the data within the system, and flowscripts for connecting all this in
an appropriate temporal order.


Hmmm, this seems like a compelling use case.
Could you please add a concrete use-case/example for this?
Thanks :-)

One use case, (if combined with persistent storage of continuations), would be a workflow system.

Besides that, input pipelines are IMO very usefull for handling request parameters from forms as well. In all webapps that we build at my company, we use absolute xpaths as request parameter names and then use a generator that builds a xml document from the name/value pairs. This xml input is then possibly transformed to another format and therafter stored in a db or as a dom tree in a session attribute.

A flowscript that uses input pipelines might look like:

handleForm("formPage1.html", "storeData1");
if (objectModel["state"] == "succees")
doBusinessLogic1(...);
...

Where formPage1.html is an output pipeline that produces a form and storeData handles and store the input.

For Reuseability Between Blocks
-------------------------------

There have been some discussions about how to reuse functionality
between blocks in Cocoon (see the threads [1] and [2] for
background). IMO (cf. my post in the thread [1]), a natural way of
exporting pipeline functionality is by extending the cocoon pseudo
protocol, so that it accepts input as well as produces output. The
protocol should also be extended so that input as well as output can
be any octet stream, not just xml.

If we extend generators so that their input can be set by the
environment (as proposed in the discussion about input pipelines), we
have what is needed for creating a writable cocoon protocol. The web
service example in the section "In Sitemaps" could also be used as an
internal service, exported from a block.

Booth input and output for the extended cocoon protocol can be booth
xml and non-xml, this give us 4 cases:

xml input, xml output: could be used from a "pipeline"-transformer,
the input to the transformer is redirected to the protocol and the
output from the protocol is redirected to the output of the
transformer.

non-xml input, xml output: could be used from a generator.

xml input, non-xml output: could be used from a serializer.

non-xml input, non-xml output: could be used from a reader if the
input is ignored, from a "writer" if the output is ignored and from a
"reader-writer", if booth are used.

Generators that accepts xml should of course also accept sax-events
for efficiency reasons, and serializer that produces xml should of the
same reason also be able to produce sax-events.


Also this seems interesting.

Please add concrete examples here to, possibly applied to blocks.
I know it's hard, but it would really help.

What I tried to describe is just a somewhat different approach to how to describe reusable pipeline fragments between blocks, so for use cases please see Sylvains and Stefanos original posts in the threads [1] and [2].

Lets take a look on an example from Sylvains post (in [1]) to illustrate what I have in mind:

<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="xdoc2skinnedHtml"/>
<map:serialize type="html"/>
</map:match>

<map:match pattern="xdoc2skinnedHtml">
<map:generate type="dont_care"/>
<map:transform type="i18n"/>
<map:transform type="xdoc2html.xsl"/>
<map:transform type="htmlskin.xsl"/>
<map:serialize type="dont_care"/>
</map:match>

Here the idea is that when xdoc2skinnedHtml is used from a pipeline transformer the generator and the serializer is not used and only the sub pipeline consisting of the three transformers in the middle is used. This behaviour is inspired by the cocoon: protocol where the serializer is skipped.

Several people thought that the removal of parts of generators and serializer depending on the usage context of the pipeline, confusing. Carsten wrote that:
"It is correct, that internally in most cases the serializer
of a pipeline is ignored, when the cocoon protocol is used.
But this is only because of performance."
And that a pipeline used from the cocoon protocol is supose to end with an xml serializer. I agree with this and think that it would be better to express the example above as (cf with my post in [1]):

<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="cocoon:xdoc2skinnedHtml"/>
<map:serialize type="html"/>
</map:match>

<map:match pattern="xdoc2skinnedHtml">
<map:generate src="inputstream:xdoc.scm"/>
<map:transform type="i18n"/>
<map:transform type="xdoc2html.xsl"/>
<map:transform type="htmlskin.xsl"/>
<map:serialize type="xml"/>
</map:match>

Here the cocoon: protocol is suposed to be a writable source. The function of the pipeline transformer is that it serializes its xml input, redirect it to the writable source in the src attribute, parses the xml output stream from the source and output the result from the parser as sax events. Of course the serialize-parse steps should be optimzed away, but this should be considered an implementation detail not part of the semantics.

By further generalizing the cocoon: protocol so that it allows non-xml output (and input) it can be used for the pipeline serializer that Sylvain proposed as well. For the pipeline generator the cocoon: protocol can be used as is.

It seems that what you propose Cocoon already mostly has, but it's more the use-case and some minor additions that have to be put forward.
Conclusion
----------

The ability to handle structured input (e.g. xml) in a convenient way,
will probably be an important requirement on webapp frameworks in the
near future.

By removing the asymmetry between generators and serializers, by letting
the input of a generator be set by the context and the output of a
serializer be set from the sitemap, Cocoon could IMO be as good in
handling input as it is today in producing output.
Cocoon already does this, no?
Can't we use the cocoon:// protocol to get the results of a pipeline from another one? What would change?

As said above, the cocoon protocol should be writable as well as readable and allow for non xml input and output. The block protocol could use the same ideas and thus give a good way of exporting functionality.

To realize the above ideas we would need to implement the inputstream protocol that in turn would require that the Request interface is extended with a getInputStream() method. The cocoon protocol should be extended as described. The proposed extension of the serializer for the use in input pipelines would require serializers to implement SitemapModelComponent.

Thank you for your comments.

/Daniel Fagerstrom

<snip/>

References
----------

[1] [RT] Using pipeline as sitemap components (long)
http://marc.theaimsgroup.com/?t=103787330400001&r=1&w=2

[2] [RT] reconsidering pipeline semantics
http://marc.theaimsgroup.com/?t=102562575200001&r=2&w=2

[3] [Contribution] Pipe-aware selection
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: [RT] Input Pipelines (long)

Reply via email to