[RT][long] Cocoon 3.0: the necessary mutation

Sylvain Wallez Fri, 02 Dec 2005 13:11:09 -0800

Hi all,

For many years, I have been more than happy with Cocoon, enjoying thepower and ease it brought for both publication and webapp projects. Overthe last months however, other feelings have emerged: there are thingsthat are definitely overly complex in Cocoon, and there have been someemerging frameworks leading to "wow, cool!" reactions rather than "yeah,yet another one". Also, I strongly believe that the Ajax revolution isquickly obsoleting the traditional reload-page-on-user-action model thatprevailed on the web up to recently and requires frameworks that helpbuilding these new kinds of applications.

All this to say that if we want Cocoon to have a bright future, it mustgo through a mutation. This RT dumps my current ideas about this,depicting something that IMO better fits today's needs while taking intoaccount many years of innovation and what we learned from them.


                      -- oOo --

First of all, let's consider the place where Cocoon fits in the largefamily of webapp development frameworks.

On one end, we have pure-J2EE things like Struts or JSF. They lead towriting a lot of Java classes, lots of XML config files, and try/failcycles require compile/deploy/restart, even if some tools ease the task.Despite their heavyweight development process, they are widely acceptedin large companies, both because the J2EE stamp pleases managers andbecause of the vast number of entreprise-grade libraries available.

On the other end, we have scripted frameworks like Ruby on Rails,Django[1], etc. The try/fail cycle is basically save/reload, and writingsimple stuff is very fast because of the use of convention overconfiguration and runtime generation of data models (e.g. throughdatabase introspection). Now recent comments[2] show that going beyondthe basic stuff may not be that easy.


Cocoon actually sits in the middle of this spectrum:

- it's a Java servlet and can therefore use almost anything that'swritten in Java. The contents of our blocks show this well! As such itis somehow J2EE compliant and can be deployed in large companies, evenif we have to convince managers that it's better than Struts.- it's a scripted framework: sitemap, XSL, templates, flowscript. Saveand reload! And this goes further in 2.2 with auto reloading andcompiling classloaders.

So theoretically, Cocoon could be the "RoR of J2EE". Now in its currentincarnation it won't. The learning curve is too steep, somearchitectural choices imposed by Cocoon actually go in the way ofdevelopers with the new emerging development practices, and 5 years oflegacy led to a rather confusing picture, with tons of legacy componentsand many inconsistencies.

Now Cocoon has also introduced a number of super-cool features andinnovations that definitely make sense, but in a more lightweight andconsistent environment.

So let's draw a picture of what could be Cocoon 3.0. I use a majorversion as the ideas outlined below are more than likely to require acode base, even if many code snippets can be reused from the current code.


                      -- oOo --

Giving its real role to the controller
--------------------------------------

When we introduced flowscript, we decided that <map:pipeline> should bethe central switchboard through which *all* request go, and introduced<map:call function>. This leads most webapps written in Cocoon to havetheir sitemap starting with something like:


 <map:match pattern="do_*">
   <map:call function="do_{1}"/>
 </map:match>

Why in hell do we have to go through the sitemap to call a function andthen go back to the sitemap through cocoon.sendPage()? This not onlyclutters up the sitemap with cut'n pasted snippets, but also makes theflowscript a second-zone citizen in the application.

So I think we should change a bit the semantics of the sitemap and thepriorities of its various components:

- the sitemap is the configuration of the overall request processing ofthe application (or a subpart of it in the case of mounts). It definesthe configuration of that request processing, which is composed ofcomponents (<map:components>), controllers (<map:flow>) and views(<map:pipeline>). And I even think these 3 parts should really be splitin different files, i.e. moving components to a xconf and pipelinedefinitions to e.g. "pipelines.xml".

- the processing flow in a sitemap goes *first* in the controller ifthere is one, and *second* in the view. Going to a <map:pipeline> tocall back a function should really be an exceptional case, or evenforbidden.

- since it's no more called by the sitemap, the controller defines asingle entry point, such as "process()". A builtin defaultimplementation provides an equivalent to <map:callfunction="public_{request:sitemapURI}"/>, thus automatically publishingany public_xxx flowscript function. This is similar toHttpServlet.service() that calls doGet(), doPost(), etc depending on theHTTP method but still allows overloading service().

- to allow sophisticated implementations of process() where needed, thematchers and selectors are made available to the controller, so thatthey can check the request environment as easily as the pipelinestatements in sitemap.xmap (also have a look at the Djangodispatcher[3]). This can allow to write something like:


 var match = cocoon.matchers.wildcard("admin/*");
 if (match) {
     if (authenticateAdmin()) {
         // call the function named by the '*'
         adminServices[match[1]]();
     } else {
         forbidden();
     }
 }

- calling cocoon.sendPage(uri) directly goes to the <map:pipeline>section of the sitemap, to build a view. This seems obvious, but has aninteresting side-effect: there is no more need to invent a private URLspace such as "view-*" to have a two-step processing (controller/view)of the requests. We can even say that "cocoon.sendPage(null)" calls thesitemap with the current request URI untouched.

Note: the controller examples in this RT are written in JavaScript, butJavaFlow should be considered on an equal ground, as Coocon's user baseis two-sided, composed of people coming from the webdesign/php world,and others coming from the J2EE world. Also, JavaFlow should be thelanguage of choice for builtin reusable helper controllers.


                      -- oOo --

Expression languages
--------------------

Do you know how many expression languages there are in Cocoon? Java,JavaScript, XPath, XReporter, JEXL, etc. There's also all themicro-languages defined by each of the input modules: many of them useXPath, but not all...

Also, the way to access a given data is not the same in the sitemap(e.g. "{request-param:foo}") and in JXTG("${cocoon.request.getParameter('foo')}" or even"#{$cocoon/request/parameters/foo}")

We should restrict the number of languages to the useful minimum, andensure they can be used consistently everywhere. This useful minimumlooks to me as being JavaScript, XPath and Java (using Janino[4]).

As for the syntax, I think we should use the simple "{..}" notation,with no initial character. To choose among the 3 expression languages,we have to choose a default one, and use prefixed expressions for theother ones. I consider JS to me the most versatile and thus to be thedefault language.

That means we'll have "{cocoon.request.remoteHost}" or"{xpath:$cocoon/request/remoteHost}" or"{java:cocoon.getRequest().getRemoteHost()}".

About XPath, I'm a bit skeptical wrt its actual usefullness with non-XMLobjects, which often looks weird. However, we need to be able to callXPath on DOM parts of a non-DOM data model, e.g."{xpath(cocoon.session.attributes.userDoc, '/meta/dc:title')}". Andinterstingly this sample shows that a namespace prefix table must beavailable in the expression context for *all* languages.

All this also means that we need a well-defined "cocoon" object definedidentically in all contexts. Additional top-level objects can beavailable to provide context-specific data, such as "flow.sendPage()","sitemap.resolve('../1')" or "template.consumer".


                      -- oOo --

Content-aware pipelines
-----------------------

Cocoon 1 had a DOM-based processing, meaning transformations could bechosen according to the pipeline content. Cocoon 2, when switching toSAX-based streamed pipelines, abandoned this ability. This hasn't been areal problem for a long time, as datasources were mostly passivedocuments of a well-known structure.

Now things have changed a lot, and we have to deal with heterogeneousdata types and content-driven processing. Let's take some real-lifeexamples:- Content syndication: a feed's URL can provide RSS 0.9, 1.0, 2.0 orAtom. How can we decide what processing has to be applied on a feed ifwe don't know what's inside?- Forrest's infamous SourceTypeAction[5] identifies a document's typeusing pull parsing- SOAP requests: why is SOAP so badly integrated with Cocoon? Webasically need to delegate to Axis that will then call a Java class. Whyso? Because we're unable to choose the service to be called depending onthe request's content.- finally, the ESB buzz is turning into real projects, and requirescontent-based routing of messages.

There were some proposals to implement content-aware selectors[6] butthey never materialized because of the impedance mismatch between a SAXstream and the usage of DOM (so Cocoon-1-ish!) that was proposed toimplement them.


Now Forrest's SourceTypeAction shows us the way: pull parsing.

So let's switch pipelines from SAX push to StAX pull (JSR 173, see[7]).Content-aware matchers and selectors can then grab just the amount ofinformation they need from the pipeline to make their decision. Andcontrarily to the SourceTypeAction that requires to resolve the source 2times (once for pull, once for push), the pipeline engine cantransparently buffer the StAX events consumed by matchers and selectorsto replay them in the next pipeline component.

Using pull pipelines doesn't mean we have to trash everything.Converting DOM to/from StAX is straightforward, and so is StAX->SAX. TheSAX->StAX conversion is less easy and requires either buffering or aseparate thread.

Using pull pipelines also has an interesting side effects onaggregations, as they can easily be inlined by pulling eventssuccessively from partial pipelines (i.e. without a serializer), e.g:


 <map:aggregate element="root">
   <map:part>
     <map:generate src="header.xml"/>
     <map:transform src="header2html.xsl"/>
   </map:part>
   <map:part src="content/{1}.xml"/>
 </map:aggregate>
 <map:transform src="layout.xsl"/>
 <map:serialize/>

Actually, writing

 <map:part src="foo"/>

is equivalent to writing

 <map:part>
   <map:generate type="file" src="foo"/>
 </map:part>

                      -- oOo --

Dynamic pipelines
-----------------

Yes, you read it well: dynamic pipelines. This is what comes nextnaturally after content-aware pipelines: with use cases like webservicesand ESBs, the content-based routing is not enough and we also needcontroller-driven routing.

For simpler cases, we already have cocoon.processPipelineTo() (and themore versatile PipelineUtils class), but having to call the sitemap andinvent a private URL just to perform a transformation is really overkill.


I'd like to be able to write the following in a flowscript:

 var pipeline = flow.newPipeline("non-caching");
 pipeline.setGenerator("stream");
 pipeline.addTransformer("xslt", "normalize.xsl");
 if (cocoon.matchers.xpath("/foo/[EMAIL PROTECTED] = '" +
         cocoon.session.attributes.bar_id + "']", pipeline)) {
     handleBar(pipeline);
 } else {
     wrongId();
 }

What we can see above is that we don't even need a serializer for thepipeline to be useful, as we can pull events from it as soon as it has agenerator. And that generator could well be another pipeline builtsomewhere else.

Basically, the pipeline engine becomes a very general-purpose objectthat can be used not only in the sitemap (to build views), but also inthe controller for content-driven business logic decisions.

This programmatic building of pipelines can also be used by Cocooncomponents themselves to implement some built-in transformations, e.g.converting an XMLSchema to a CForms definition, without requiring tocopy/paste the corresponding sitemap instructions in user sitemaps, ofrequiring to call a system-defined sitemap. IMO, the lack of reusablesystem pipelines is one of the reasons why there hasn't been manyoff-the-shelf products or applications built on top of Cocoon.

Being able to directly use pipelines can also ease the integration ofCocoon as a transformation engine in other environments, such as anadvanced message transformer in the ServiceMix ESB[8].


                      -- oOo --

Controller-driven responses
---------------------------

The advent of Ajax applications leads to a radical change in webapplications architectures. There are many requests that don't lead toproducing a view, but sending data and/or control information. Having tocall a pipeline for this is really useless and overkill, as we don'tneed any kind of processing.

We therefore need the controller to be able to directly send anon-processed response. We already have an example of this in the Ajaxstuff for CForms[9] to send a simple <bu:continue> when form interactionis finished and a full page reload is needed. Another example is datatransmission with an Ajax client using JSON[10].

So we need additional "sendxxx" methods in the controller: sendText(),sendObject(), sendBytes() and why not sendStream().

Ajax applications also require aggregations defined at the controllerlevel. Let's consider an Ajax shopping cart application: the pagedisplays the items catalogue and a sidebar with the current content ofthe shopping cart. When the user browses the items, only the cataloguearea needs to be refreshed on the page. When he adds an item to thecart, both areas need to be refreshed at once to show the updated cart.The knowledge of what parts of the page need to be refreshed is in thecontroller. A solution can be to call a pipeline that will generate anXInclude that itself will call other pipelines, but that's smelly anddoesn't allow to give different view data to each of the pipelines.


To allow this, we need something like:
 flow.sendMultiple(
     ["catalogue", { paginator: paginator }],
     ["cart-sidebar", { cart: cart }]
 );

                      -- oOo --

Core components
---------------------

Moving to pull pipelines isn't the only important core change: we needto move away from Avalon for good. Now what container will we use? Wedon't care: Cocoon 3.0 will be written as POJOs, and will come with a"default" container. Will it be Spring, Hivemind, Pico? I don't know. Wemay even provide configurations for several containers, as doesXFire[xxxxx].


                      -- oOo --

Ok, thanks reading so far.

My impression is that with all these changes, Cocoon will be sexy again.Add a bit of runtime analysis of databases and automatic generation ofCForms to the picture, and you have something that has the sameproductivity as RoR, but in a J2EE environment. It also includes what Ilearned when working on Ajax and the consequences it has on the overallsystem architecture.

You certainly have noticed that the above is more about the controllerthan about the sitemap. This is because not much changes are neededthere, except content-aware matchers and selectors. But a more featuredcontroller will allow to trash a great number of pipeline componentsthat were invented to circumvent controller limitations. The code basewill shrink.

There are also a number of simplifications that can be done by usingbuiltin conventions over configuration, but I'll write about this later.

Tell me your thoughts. Am I completely off-track, or do you also want tobuild this great new thing?


Sylvain

[1] www.djangoproject.com/
[2] http://www.andrewsavory.com/blog/archives/000976.html
[3] http://www.djangoproject.com/documentation/url_dispatch/
[4] http://www.janino.net/

[5]http://svn.apache.org/repos/asf/forrest/trunk/main/java/org/apache/forrest/sourcetype/SourceTypeAction.java

[6] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101554683923592&w=2
[7] http://stax.codehaus.org/
[8] http://servicemix.codehaus.org/

[9]http://svn.apache.org/repos/asf/cocoon/blocks/forms/trunk/java/org/apache/cocoon/forms/flow/javascript/Form.js

[10] http://bluxte.net/blog/2005-11/17-49-57.html
[11] http://xfire.codehaus.org/

--
Sylvain Wallez                        Anyware Technologies
http://bluxte.net                     http://www.anyware-tech.com
Apache Software Foundation Member     Research & Technology Director

[RT][long] Cocoon 3.0: the necessary mutation

Reply via email to