[RT] reconsidering pipeline semantics

Stefano Mazzocchi Tue, 02 Jul 2002 08:41:25 -0700

In light of the discussion on blocks, Sylvain pointed out that cocoon
services should be mapped to pipelines and not to resources directly.


This consideration triggered a few RT that I would like to share with
you and trigger further discussion.

NOTE: this is nothing related to blocks or flow, but only at the sitemap
semantics.

                                 - o -

What is a pipeline
------------------

The first and major architectural contribution that Cocoon brought in
the web world is the ability to compose web services using the "pipe and
filters" design pattern. (I'm using 'web services' in the original sense
of the term: any service that is related to the web)

Cocoon decided to follow an XML-oriented approach to pipelines, forcing
everything in the XML real and working on that from there. So, the
Cocoon's pipelines concept is somewhat an extension to the original GoF
"pipe and filters" pattern: in fact, the Cocoon pipeline implements both
'pipe and filters' and 'adaptor' patterns.

Why? well, this comes from the fact that the HTTP protocol is not XML
oriented (unlike SOAP, for example). So, in order to perform XML piping,
we need to adapt in and out from the generic octet-stream world.

So, unlike the UNIX pipeline which doesn't need adaptation (since the
STDIN/OUT streams are all octet-oriented), Cocoon needed to create ways
to adapt to the rest of the world which is not XML oriented.

For this reason, why a UNIX pipeline is composed like this

 input -> filter -[pipe]-> filter -[pipe]-> filter -> output

a cocoon pipeline is composed by

 input -> adaptor -[pipe]-> filter -[pipe]-> adaptor -> output

unfortunately, the above picture isn't entirely correct since the two
adaptors can't be exchanged, thus they are, in fact, different entities:
the first adapts an octet-based world to an XML-based world, the other
does the opposite. They are not symmetrical. In Cocoon terminology, the
first adapter is a generator, the second is a serializer.

We call 'Cocoon pipeline' the collection of all filters (transformers)
and adapters (generator and serializer) because there cannot be a
pipeline without adapters.

I think it's time to challenge this concept.

                            - o -

What are sitemap resources?
---------------------------

Let me tell you: they are a mistake, a mistake I did trying to reduce
the sitemap verbosity and fixing a problem that didn't yet emerged at
that time. Early optimization is the root of all evil and I see that
now: resources overlap with pipelines.

Let me show you why. Consider this sitemap snippet:

 <sitemap>
  <resources>
   <resource name="blah">
    <generate ../>
    <transform ../>
    <serialize ../>
   </resource>
  </resources>
 
  <pipelines>
   <pipeline internal-only="true">
    <match pattern="*">
     <call resource="blah"/>
    </match>
   </pipeline>
  </pipelines>
 </sitemap>

and now this

 <sitemap>
  <pipelines>
   <pipeline name="blah">
    <generate ../>
    <transform ../>
    <serialize ../>
   </pipeline>

   <pipeline>
    <match pattern="*">
     <call pipeline="blah"/>
    </match>
   </pipeline>
  </pipelines>
 </sitemap>

which one is more semantically consistent? Can you say "named XSLT
templates"?

Composing pipelines
-------------------

Let me assume the above syntax gets introduced. At this point, we have
four different ways to call a pipeline:

 - as a pipeline
 - as a generator
 - as a transformer
 - as a serializer

let me write the code so you understand what I mean:

[using a pipeline as a pipeline] (as today)

   <pipeline>
    <match pattern="*">
     <call pipeline="blah"/>
    </match>
   </pipeline>

nothing fancy here. Used mainly for verbosity reduction when the same
pipeline is used in different places.

[using a pipeline as a generator]

   <pipeline>
    <match pattern="*">
     <call pipeline="blah"/>
     <transform .../>
     <serialize ../>
    </match>
   </pipeline>

in this case, the 'serializer' of the called pipeline is not used and
the output of the last tranformer of the named pipeline is connected
with the input of the transformer right after the call.

This is equivalent of *overloading* the serializer of the called
pipeline with the rest of the pipeline in place.

[using a pipeline as a transformer]

   <pipeline>
    <match pattern="*">
     <generate ../>
     <call pipeline="blah"/>
     <serialize ../>
    </match>
   </pipeline>

where both the generator and the serializer of the named pipeline are
not used.

This is equivalent of *overloading* both the generator and the
serializer of the called pipeline with the rest of the pipeline in
place.

[using a pipeline as a serializer]

   <pipeline>
    <match pattern="*">
     <generate ../>
     <tranform ../>
     <call pipeline="blah"/>
    </match>
   </pipeline>

where the generator of the named pipeline is not used.

This is equivalent of *overloading* the generator of the called pipeline
with the rest of the pipeline in place.

                               - o -

So, here is what I propose:

 - add the 'pipeline' attribute to 'map:call'
 - add the 'name' attribute to 'map:pipeline'
 - deprecate the 'map:resources' element
 - deprecate 'internal-only' attribute of 'map:pipeline' 
   [because named pipelines become implicitly internal-only]
 - allow 'map:call' to be executed in any place, performing the pipeline
overloading behavior I explained above.

What do you think?  

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

[RT] reconsidering pipeline semantics

Reply via email to