Hi all,

For many years, I have been more than happy with Cocoon, enjoying the power and ease it brought for both publication and webapp projects. Over the last months however, other feelings have emerged: there are things that are definitely overly complex in Cocoon, and there have been some emerging frameworks leading to "wow, cool!" reactions rather than "yeah, yet another one". Also, I strongly believe that the Ajax revolution is quickly obsoleting the traditional reload-page-on-user-action model that prevailed on the web up to recently and requires frameworks that help building these new kinds of applications.

All this to say that if we want Cocoon to have a bright future, it must go through a mutation. This RT dumps my current ideas about this, depicting something that IMO better fits today's needs while taking into account many years of innovation and what we learned from them.

                      -- oOo --

First of all, let's consider the place where Cocoon fits in the large family of webapp development frameworks.

On one end, we have pure-J2EE things like Struts or JSF. They lead to writing a lot of Java classes, lots of XML config files, and try/fail cycles require compile/deploy/restart, even if some tools ease the task. Despite their heavyweight development process, they are widely accepted in large companies, both because the J2EE stamp pleases managers and because of the vast number of entreprise-grade libraries available.

On the other end, we have scripted frameworks like Ruby on Rails, Django[1], etc. The try/fail cycle is basically save/reload, and writing simple stuff is very fast because of the use of convention over configuration and runtime generation of data models (e.g. through database introspection). Now recent comments[2] show that going beyond the basic stuff may not be that easy.

Cocoon actually sits in the middle of this spectrum:
- it's a Java servlet and can therefore use almost anything that's written in Java. The contents of our blocks show this well! As such it is somehow J2EE compliant and can be deployed in large companies, even if we have to convince managers that it's better than Struts. - it's a scripted framework: sitemap, XSL, templates, flowscript. Save and reload! And this goes further in 2.2 with auto reloading and compiling classloaders.

So theoretically, Cocoon could be the "RoR of J2EE". Now in its current incarnation it won't. The learning curve is too steep, some architectural choices imposed by Cocoon actually go in the way of developers with the new emerging development practices, and 5 years of legacy led to a rather confusing picture, with tons of legacy components and many inconsistencies.

Now Cocoon has also introduced a number of super-cool features and innovations that definitely make sense, but in a more lightweight and consistent environment.

So let's draw a picture of what could be Cocoon 3.0. I use a major version as the ideas outlined below are more than likely to require a code base, even if many code snippets can be reused from the current code.

                      -- oOo --

Giving its real role to the controller
--------------------------------------

When we introduced flowscript, we decided that <map:pipeline> should be the central switchboard through which *all* request go, and introduced <map:call function>. This leads most webapps written in Cocoon to have their sitemap starting with something like:

 <map:match pattern="do_*">
   <map:call function="do_{1}"/>
 </map:match>

Why in hell do we have to go through the sitemap to call a function and then go back to the sitemap through cocoon.sendPage()? This not only clutters up the sitemap with cut'n pasted snippets, but also makes the flowscript a second-zone citizen in the application.

So I think we should change a bit the semantics of the sitemap and the priorities of its various components:

- the sitemap is the configuration of the overall request processing of the application (or a subpart of it in the case of mounts). It defines the configuration of that request processing, which is composed of components (<map:components>), controllers (<map:flow>) and views (<map:pipeline>). And I even think these 3 parts should really be split in different files, i.e. moving components to a xconf and pipeline definitions to e.g. "pipelines.xml".

- the processing flow in a sitemap goes *first* in the controller if there is one, and *second* in the view. Going to a <map:pipeline> to call back a function should really be an exceptional case, or even forbidden.

- since it's no more called by the sitemap, the controller defines a single entry point, such as "process()". A builtin default implementation provides an equivalent to <map:call function="public_{request:sitemapURI}"/>, thus automatically publishing any public_xxx flowscript function. This is similar to HttpServlet.service() that calls doGet(), doPost(), etc depending on the HTTP method but still allows overloading service().

- to allow sophisticated implementations of process() where needed, the matchers and selectors are made available to the controller, so that they can check the request environment as easily as the pipeline statements in sitemap.xmap (also have a look at the Django dispatcher[3]). This can allow to write something like:

 var match = cocoon.matchers.wildcard("admin/*");
 if (match) {
     if (authenticateAdmin()) {
         // call the function named by the '*'
         adminServices[match[1]]();
     } else {
         forbidden();
     }
 }

- calling cocoon.sendPage(uri) directly goes to the <map:pipeline> section of the sitemap, to build a view. This seems obvious, but has an interesting side-effect: there is no more need to invent a private URL space such as "view-*" to have a two-step processing (controller/view) of the requests. We can even say that "cocoon.sendPage(null)" calls the sitemap with the current request URI untouched.

Note: the controller examples in this RT are written in JavaScript, but JavaFlow should be considered on an equal ground, as Coocon's user base is two-sided, composed of people coming from the webdesign/php world, and others coming from the J2EE world. Also, JavaFlow should be the language of choice for builtin reusable helper controllers.

                      -- oOo --

Expression languages
--------------------

Do you know how many expression languages there are in Cocoon? Java, JavaScript, XPath, XReporter, JEXL, etc. There's also all the micro-languages defined by each of the input modules: many of them use XPath, but not all...

Also, the way to access a given data is not the same in the sitemap (e.g. "{request-param:foo}") and in JXTG ("${cocoon.request.getParameter('foo')}" or even "#{$cocoon/request/parameters/foo}")

We should restrict the number of languages to the useful minimum, and ensure they can be used consistently everywhere. This useful minimum looks to me as being JavaScript, XPath and Java (using Janino[4]).

As for the syntax, I think we should use the simple "{..}" notation, with no initial character. To choose among the 3 expression languages, we have to choose a default one, and use prefixed expressions for the other ones. I consider JS to me the most versatile and thus to be the default language.

That means we'll have "{cocoon.request.remoteHost}" or "{xpath:$cocoon/request/remoteHost}" or "{java:cocoon.getRequest().getRemoteHost()}".

About XPath, I'm a bit skeptical wrt its actual usefullness with non-XML objects, which often looks weird. However, we need to be able to call XPath on DOM parts of a non-DOM data model, e.g. "{xpath(cocoon.session.attributes.userDoc, '/meta/dc:title')}". And interstingly this sample shows that a namespace prefix table must be available in the expression context for *all* languages.

All this also means that we need a well-defined "cocoon" object defined identically in all contexts. Additional top-level objects can be available to provide context-specific data, such as "flow.sendPage()", "sitemap.resolve('../1')" or "template.consumer".

                      -- oOo --

Content-aware pipelines
-----------------------

Cocoon 1 had a DOM-based processing, meaning transformations could be chosen according to the pipeline content. Cocoon 2, when switching to SAX-based streamed pipelines, abandoned this ability. This hasn't been a real problem for a long time, as datasources were mostly passive documents of a well-known structure.

Now things have changed a lot, and we have to deal with heterogeneous data types and content-driven processing. Let's take some real-life examples: - Content syndication: a feed's URL can provide RSS 0.9, 1.0, 2.0 or Atom. How can we decide what processing has to be applied on a feed if we don't know what's inside? - Forrest's infamous SourceTypeAction[5] identifies a document's type using pull parsing - SOAP requests: why is SOAP so badly integrated with Cocoon? We basically need to delegate to Axis that will then call a Java class. Why so? Because we're unable to choose the service to be called depending on the request's content. - finally, the ESB buzz is turning into real projects, and requires content-based routing of messages.

There were some proposals to implement content-aware selectors[6] but they never materialized because of the impedance mismatch between a SAX stream and the usage of DOM (so Cocoon-1-ish!) that was proposed to implement them.

Now Forrest's SourceTypeAction shows us the way: pull parsing.

So let's switch pipelines from SAX push to StAX pull (JSR 173, see[7]). Content-aware matchers and selectors can then grab just the amount of information they need from the pipeline to make their decision. And contrarily to the SourceTypeAction that requires to resolve the source 2 times (once for pull, once for push), the pipeline engine can transparently buffer the StAX events consumed by matchers and selectors to replay them in the next pipeline component.

Using pull pipelines doesn't mean we have to trash everything. Converting DOM to/from StAX is straightforward, and so is StAX->SAX. The SAX->StAX conversion is less easy and requires either buffering or a separate thread.

Using pull pipelines also has an interesting side effects on aggregations, as they can easily be inlined by pulling events successively from partial pipelines (i.e. without a serializer), e.g:

 <map:aggregate element="root">
   <map:part>
     <map:generate src="header.xml"/>
     <map:transform src="header2html.xsl"/>
   </map:part>
   <map:part src="content/{1}.xml"/>
 </map:aggregate>
 <map:transform src="layout.xsl"/>
 <map:serialize/>

Actually, writing

 <map:part src="foo"/>

is equivalent to writing

 <map:part>
   <map:generate type="file" src="foo"/>
 </map:part>

                      -- oOo --

Dynamic pipelines
-----------------

Yes, you read it well: dynamic pipelines. This is what comes next naturally after content-aware pipelines: with use cases like webservices and ESBs, the content-based routing is not enough and we also need controller-driven routing.

For simpler cases, we already have cocoon.processPipelineTo() (and the more versatile PipelineUtils class), but having to call the sitemap and invent a private URL just to perform a transformation is really overkill.

I'd like to be able to write the following in a flowscript:

 var pipeline = flow.newPipeline("non-caching");
 pipeline.setGenerator("stream");
 pipeline.addTransformer("xslt", "normalize.xsl");
 if (cocoon.matchers.xpath("/foo/[EMAIL PROTECTED] = '" +
         cocoon.session.attributes.bar_id + "']", pipeline)) {
     handleBar(pipeline);
 } else {
     wrongId();
 }

What we can see above is that we don't even need a serializer for the pipeline to be useful, as we can pull events from it as soon as it has a generator. And that generator could well be another pipeline built somewhere else.

Basically, the pipeline engine becomes a very general-purpose object that can be used not only in the sitemap (to build views), but also in the controller for content-driven business logic decisions.

This programmatic building of pipelines can also be used by Cocoon components themselves to implement some built-in transformations, e.g. converting an XMLSchema to a CForms definition, without requiring to copy/paste the corresponding sitemap instructions in user sitemaps, of requiring to call a system-defined sitemap. IMO, the lack of reusable system pipelines is one of the reasons why there hasn't been many off-the-shelf products or applications built on top of Cocoon.

Being able to directly use pipelines can also ease the integration of Cocoon as a transformation engine in other environments, such as an advanced message transformer in the ServiceMix ESB[8].

                      -- oOo --

Controller-driven responses
---------------------------

The advent of Ajax applications leads to a radical change in web applications architectures. There are many requests that don't lead to producing a view, but sending data and/or control information. Having to call a pipeline for this is really useless and overkill, as we don't need any kind of processing.

We therefore need the controller to be able to directly send a non-processed response. We already have an example of this in the Ajax stuff for CForms[9] to send a simple <bu:continue> when form interaction is finished and a full page reload is needed. Another example is data transmission with an Ajax client using JSON[10].

So we need additional "sendxxx" methods in the controller: sendText(), sendObject(), sendBytes() and why not sendStream().

Ajax applications also require aggregations defined at the controller level. Let's consider an Ajax shopping cart application: the page displays the items catalogue and a sidebar with the current content of the shopping cart. When the user browses the items, only the catalogue area needs to be refreshed on the page. When he adds an item to the cart, both areas need to be refreshed at once to show the updated cart. The knowledge of what parts of the page need to be refreshed is in the controller. A solution can be to call a pipeline that will generate an XInclude that itself will call other pipelines, but that's smelly and doesn't allow to give different view data to each of the pipelines.

To allow this, we need something like:
 flow.sendMultiple(
     ["catalogue", { paginator: paginator }],
     ["cart-sidebar", { cart: cart }]
 );

                      -- oOo --

Core components
---------------------

Moving to pull pipelines isn't the only important core change: we need to move away from Avalon for good. Now what container will we use? We don't care: Cocoon 3.0 will be written as POJOs, and will come with a "default" container. Will it be Spring, Hivemind, Pico? I don't know. We may even provide configurations for several containers, as does XFire[xxxxx].

                      -- oOo --

Ok, thanks reading so far.

My impression is that with all these changes, Cocoon will be sexy again. Add a bit of runtime analysis of databases and automatic generation of CForms to the picture, and you have something that has the same productivity as RoR, but in a J2EE environment. It also includes what I learned when working on Ajax and the consequences it has on the overall system architecture.

You certainly have noticed that the above is more about the controller than about the sitemap. This is because not much changes are needed there, except content-aware matchers and selectors. But a more featured controller will allow to trash a great number of pipeline components that were invented to circumvent controller limitations. The code base will shrink.

There are also a number of simplifications that can be done by using builtin conventions over configuration, but I'll write about this later.

Tell me your thoughts. Am I completely off-track, or do you also want to build this great new thing?

Sylvain

[1] www.djangoproject.com/
[2] http://www.andrewsavory.com/blog/archives/000976.html
[3] http://www.djangoproject.com/documentation/url_dispatch/
[4] http://www.janino.net/
[5] http://svn.apache.org/repos/asf/forrest/trunk/main/java/org/apache/forrest/sourcetype/SourceTypeAction.java
[6] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101554683923592&w=2
[7] http://stax.codehaus.org/
[8] http://servicemix.codehaus.org/
[9] http://svn.apache.org/repos/asf/cocoon/blocks/forms/trunk/java/org/apache/cocoon/forms/flow/javascript/Form.js
[10] http://bluxte.net/blog/2005-11/17-49-57.html
[11] http://xfire.codehaus.org/

--
Sylvain Wallez                        Anyware Technologies
http://bluxte.net                     http://www.anyware-tech.com
Apache Software Foundation Member     Research & Technology Director

Reply via email to