[RT] Using pipeline as sitemap components (long)

Sylvain Wallez Thu, 21 Nov 2002 02:06:42 -0800

The discussions around Stefano's "Cocoon blocks version 1.1" showed the need for pipelines to provide not only resources, but also services, identified by their URI.

This document defines this concept of "pipeline service", which, as we will see, consists in using pipelines as sitemap components (generator, transformer and serializer). It is separated from the blocks design document since pipeline services can be used without blocks, even if they will be mostly useful in that context.

What is a pipeline ?
--------------------

The concept of pipeline, a central part of the Cocoon architecture, is a chain of components handling XML documents as SAX events. By "handling", we mean 3 different things :

- generate : at the start of the chain, produce an initial document and feed the next component in the chain with the result.

- transform : take the content produced by preceding components in the chain (either a generator or another transformer), transform it and feed the next component in the chain with the result.

- serialize : take the content produced by the preceding component in the chain (either a generator or a transformer), and convert this XML stream to a binary stream.

These 3 concepts are represented using only 2 interfaces, XMLProducer and XMLConsumer :
- a generator is an XMLProducer,
- a transformer is an XMLConsumer _and_ an XMLProducer,
- a serializer is an XMLConsumer.

The "cocoon:" protocol
----------------------

Up to now, we've considered pipelines as a "final" concept. This means that a pipeline has to be considered as a whole : it handles a request and answers by the result of it's execution.

Well, in fact, we "nearly" considered it as final. Consider the "cocoon:" protocol that is so useful. What happens if we write the following :

<map:match pattern="first-uri">
<map:generate type="file" src="cocoon://other-uri"/>
<map:transform src="foo.xslt"/>
<map:serialize/>
</map:match>

We're simply using another pipeline as the starting point of the current one. We have used a pipeline as the generator of another one.

Most often, the "other-uri" builds a pipeline that is terminated by a <map:serialize type="xml"/> because we want it to produce xml for the calling generator. But this serializer is a fake : you can put any serializer you like, it doesn't matter. What happens under the hood is that the SAX events produced by the component immediately preceding the serializer are used as the output of the generator in the calling pipeline.

So in the above example, when requesting "first-uri", we actually chain the generators and transformers of "other-uri" to the transformers and serializer of "first-uri".

Pipelines as generators
-----------------------

This leads to a first conclusion : using a pipeline as a generator means using the SAX events produced by the last XMLProducer of that pipeline, i.e. the last transformer or the generator if there are no transformers.

Since we've used a pipeline as a genererator, let's introduce a new generator for this purpose, instead of using the "file" one, which fools us in thinking we use a full pipeline when it actually strips out the serializer :

<map:match src="first-uri">
<map:generate type="pipeline" src="/other-uri"/>
<map:transform src="foo.xslt"/>
<map:serialize/>
</map:match/>

I don't see a need for a new sitemap element such as "map:call-pipeline" or "map:generate-from-pipeline". What we want is to generate and initial content in the current pipeline, and for this we just use a particular implementation of a generator, as we already do for files, XSP, etc.

Pipelines as serializers
------------------------

We've seen how to use a pipeline as the generator of another one, let's consider now the other end of the chain : using a pipeline as a serializer.

Let's suppose have defined a pipeline that gets an XML document in the xdoc DTD and formats it to PDF. This can be for example :

<map:match pattern="doc2pdf">
<map:generate src="an_xdoc.xml"/>
<map:transform src="doc2fo.xslt"/>
<map:serialize type="fo2pdf"/>
</map:match>

The interesting part here isn't the initial document, but the chaining of a stylesheet that produces an xsl:fo version of its input and the FOP serializer. This is the typical example of what is called a "service" in the current block specification.

Now how do we reuse this in other pipelines ? Yes, we can define a <map:resource>. But this resource will be available only in the current sitemap, and not in other sitemaps nor blocks.

What actually means "reusing" this ? This means producing a xdoc document and _serializing_ it to PDF. We don't actually care if there is a serializer to PDF that directly accepts xdocs or if there are one or more transformations before serializing.

This leads to a second conclusion : using a pipeline as a serializer means sending the SAX events of the calling pipeline to the first XMLConsumer of the called pipeline.

How do we use this ? Well, just as for the generator, let's define a new "pipeline" serializer :

<map:generate src="another_xdoc.xml"/>
<map:serialize type="pipeline" src="doc2pdf"/>

Note : the "src" attribute doesn't currently exist on <map:serialize>, but it seems the more natural and consistent way to name the called pipeline. Wether this translates to implementing SitemapModelComponent or not is another story.

Pipelines as transformers
-------------------------

And here comes the last use of a pipeline : as a transformer. Let's consider the following :

<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="i18n"/>
<map:transform src="xdoc2html.xsl"/>
<map:transform src="htmlskin.xsl"/>
<map:serialize type="html"/>
</map:match>

The 3 transformers define a transformation service that takes an xdoc as input and produces some skinned html. To achieve reusability, we would like to have a "xdoc2skinnedHtml" transformer. We can write this like the following :

<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="xdoc2skinnedHtml"/>
<map:serialize type="html"/>
</map:match>

and

<map:match pattern="xdoc2skinnedHtml">
<map:generate type="dont_care"/>
<map:transform type="i18n"/>
<map:transform type="xdoc2html.xsl"/>
<map:transform type="htmlskin.xsl"/>
<map:serialize type="dont_care"/>
</map:match>

This leads to a third conclusion : using a pipeline as a transformer means feeding the SAX events of the calling pipeline to the first transformer of the called pipeline, and sending the output of the last transformer of the called pipeline to the next XMLConsumer of the calling pipeline.

Note : if there are no transformers in the called pipeline (i.e. it's only a generator and a serializer), the "pipeline" transformer does nothing and only copies its input to its output.

Relation to blocks
------------------

Up to now, we made no mention of blocks. The "src" attribute of the new "pipeline" sitemap components is an URI that is considered as what follows the first "/" in the "cocoon:" protocol :
- "/pipeline-uri" is resolved by calling the root sitemap,
- "pipeline-uri" is resolved by calling the current sitemap.

We can now introduce blocks :
- "block:foo:pipeline-uri" is resolved by calling the "foo" block.

So if we consider the transformer example above, and move the "xdoc2skinnedHtml" pipeline to a "skin" block, our sitemap becomes :

<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="block:skin:xdoc2skinnedHtml"/>
<map:serialize/>
</map:match/>

Questions and answers
---------------------

Q: What about caching when we call a pipeline ?

A: This should integrate smoothly : the cache key and validity of the "pipeline" generator, transformer and serializer are the composition of cache keys and validities of the used components of the called pipeline.

--o--

Q: Doesn't this deprecate the use of the "cocoon:" protocol ?

A: No. The only notation that may be deprecated is <map:generate type="file" src="cocoon://xxx"/> that can now be written <map:generate type="pipeline" src="/xxx"/>. Other uses of the "cocoon" protocol keep their usefulness.

--o--

Q: I want do define a pipeline that will be used only as a transformation service. Why must I write a <map:generate> and a <map:serialize> in its definition ?

A: Because the sitemap, as a pipeline building language, must be able to determine the start of a pipeline and its end, even if not all its components are used. Like opening and closing braces in Java, the generator begins the pipeline definition and the serializer ends it.

Ok. Thanks for reading so far. What are your thoughts about this ? If we agree on it, I'll update the Cocoon blocks document so that block services are shown as "pipeline" sitemap components.

Sylvain

--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

[RT] Using pipeline as sitemap components (long)

Reply via email to