Hi all,
For many years, I have been more than happy with Cocoon, enjoying the
power and ease it brought for both publication and webapp projects. Over
the last months however, other feelings have emerged: there are things
that are definitely overly complex in Cocoon, and there have been some
emerging frameworks leading to "wow, cool!" reactions rather than "yeah,
yet another one". Also, I strongly believe that the Ajax revolution is
quickly obsoleting the traditional reload-page-on-user-action model that
prevailed on the web up to recently and requires frameworks that help
building these new kinds of applications.
All this to say that if we want Cocoon to have a bright future, it must
go through a mutation. This RT dumps my current ideas about this,
depicting something that IMO better fits today's needs while taking into
account many years of innovation and what we learned from them.
-- oOo --
First of all, let's consider the place where Cocoon fits in the large
family of webapp development frameworks.
On one end, we have pure-J2EE things like Struts or JSF. They lead to
writing a lot of Java classes, lots of XML config files, and try/fail
cycles require compile/deploy/restart, even if some tools ease the task.
Despite their heavyweight development process, they are widely accepted
in large companies, both because the J2EE stamp pleases managers and
because of the vast number of entreprise-grade libraries available.
On the other end, we have scripted frameworks like Ruby on Rails,
Django[1], etc. The try/fail cycle is basically save/reload, and writing
simple stuff is very fast because of the use of convention over
configuration and runtime generation of data models (e.g. through
database introspection). Now recent comments[2] show that going beyond
the basic stuff may not be that easy.
Cocoon actually sits in the middle of this spectrum:
- it's a Java servlet and can therefore use almost anything that's
written in Java. The contents of our blocks show this well! As such it
is somehow J2EE compliant and can be deployed in large companies, even
if we have to convince managers that it's better than Struts.
- it's a scripted framework: sitemap, XSL, templates, flowscript. Save
and reload! And this goes further in 2.2 with auto reloading and
compiling classloaders.
So theoretically, Cocoon could be the "RoR of J2EE". Now in its current
incarnation it won't. The learning curve is too steep, some
architectural choices imposed by Cocoon actually go in the way of
developers with the new emerging development practices, and 5 years of
legacy led to a rather confusing picture, with tons of legacy components
and many inconsistencies.
Now Cocoon has also introduced a number of super-cool features and
innovations that definitely make sense, but in a more lightweight and
consistent environment.
So let's draw a picture of what could be Cocoon 3.0. I use a major
version as the ideas outlined below are more than likely to require a
code base, even if many code snippets can be reused from the current code.
-- oOo --
Giving its real role to the controller
--------------------------------------
When we introduced flowscript, we decided that <map:pipeline> should be
the central switchboard through which *all* request go, and introduced
<map:call function>. This leads most webapps written in Cocoon to have
their sitemap starting with something like:
<map:match pattern="do_*">
<map:call function="do_{1}"/>
</map:match>
Why in hell do we have to go through the sitemap to call a function and
then go back to the sitemap through cocoon.sendPage()? This not only
clutters up the sitemap with cut'n pasted snippets, but also makes the
flowscript a second-zone citizen in the application.
So I think we should change a bit the semantics of the sitemap and the
priorities of its various components:
- the sitemap is the configuration of the overall request processing of
the application (or a subpart of it in the case of mounts). It defines
the configuration of that request processing, which is composed of
components (<map:components>), controllers (<map:flow>) and views
(<map:pipeline>). And I even think these 3 parts should really be split
in different files, i.e. moving components to a xconf and pipeline
definitions to e.g. "pipelines.xml".
- the processing flow in a sitemap goes *first* in the controller if
there is one, and *second* in the view. Going to a <map:pipeline> to
call back a function should really be an exceptional case, or even
forbidden.
- since it's no more called by the sitemap, the controller defines a
single entry point, such as "process()". A builtin default
implementation provides an equivalent to <map:call
function="public_{request:sitemapURI}"/>, thus automatically publishing
any public_xxx flowscript function. This is similar to
HttpServlet.service() that calls doGet(), doPost(), etc depending on the
HTTP method but still allows overloading service().
- to allow sophisticated implementations of process() where needed, the
matchers and selectors are made available to the controller, so that
they can check the request environment as easily as the pipeline
statements in sitemap.xmap (also have a look at the Django
dispatcher[3]). This can allow to write something like:
var match = cocoon.matchers.wildcard("admin/*");
if (match) {
if (authenticateAdmin()) {
// call the function named by the '*'
adminServices[match[1]]();
} else {
forbidden();
}
}
- calling cocoon.sendPage(uri) directly goes to the <map:pipeline>
section of the sitemap, to build a view. This seems obvious, but has an
interesting side-effect: there is no more need to invent a private URL
space such as "view-*" to have a two-step processing (controller/view)
of the requests. We can even say that "cocoon.sendPage(null)" calls the
sitemap with the current request URI untouched.
Note: the controller examples in this RT are written in JavaScript, but
JavaFlow should be considered on an equal ground, as Coocon's user base
is two-sided, composed of people coming from the webdesign/php world,
and others coming from the J2EE world. Also, JavaFlow should be the
language of choice for builtin reusable helper controllers.
-- oOo --
Expression languages
--------------------
Do you know how many expression languages there are in Cocoon? Java,
JavaScript, XPath, XReporter, JEXL, etc. There's also all the
micro-languages defined by each of the input modules: many of them use
XPath, but not all...
Also, the way to access a given data is not the same in the sitemap
(e.g. "{request-param:foo}") and in JXTG
("${cocoon.request.getParameter('foo')}" or even
"#{$cocoon/request/parameters/foo}")
We should restrict the number of languages to the useful minimum, and
ensure they can be used consistently everywhere. This useful minimum
looks to me as being JavaScript, XPath and Java (using Janino[4]).
As for the syntax, I think we should use the simple "{..}" notation,
with no initial character. To choose among the 3 expression languages,
we have to choose a default one, and use prefixed expressions for the
other ones. I consider JS to me the most versatile and thus to be the
default language.
That means we'll have "{cocoon.request.remoteHost}" or
"{xpath:$cocoon/request/remoteHost}" or
"{java:cocoon.getRequest().getRemoteHost()}".
About XPath, I'm a bit skeptical wrt its actual usefullness with non-XML
objects, which often looks weird. However, we need to be able to call
XPath on DOM parts of a non-DOM data model, e.g.
"{xpath(cocoon.session.attributes.userDoc, '/meta/dc:title')}". And
interstingly this sample shows that a namespace prefix table must be
available in the expression context for *all* languages.
All this also means that we need a well-defined "cocoon" object defined
identically in all contexts. Additional top-level objects can be
available to provide context-specific data, such as "flow.sendPage()",
"sitemap.resolve('../1')" or "template.consumer".
-- oOo --
Content-aware pipelines
-----------------------
Cocoon 1 had a DOM-based processing, meaning transformations could be
chosen according to the pipeline content. Cocoon 2, when switching to
SAX-based streamed pipelines, abandoned this ability. This hasn't been a
real problem for a long time, as datasources were mostly passive
documents of a well-known structure.
Now things have changed a lot, and we have to deal with heterogeneous
data types and content-driven processing. Let's take some real-life
examples:
- Content syndication: a feed's URL can provide RSS 0.9, 1.0, 2.0 or
Atom. How can we decide what processing has to be applied on a feed if
we don't know what's inside?
- Forrest's infamous SourceTypeAction[5] identifies a document's type
using pull parsing
- SOAP requests: why is SOAP so badly integrated with Cocoon? We
basically need to delegate to Axis that will then call a Java class. Why
so? Because we're unable to choose the service to be called depending on
the request's content.
- finally, the ESB buzz is turning into real projects, and requires
content-based routing of messages.
There were some proposals to implement content-aware selectors[6] but
they never materialized because of the impedance mismatch between a SAX
stream and the usage of DOM (so Cocoon-1-ish!) that was proposed to
implement them.
Now Forrest's SourceTypeAction shows us the way: pull parsing.
So let's switch pipelines from SAX push to StAX pull (JSR 173, see[7]).
Content-aware matchers and selectors can then grab just the amount of
information they need from the pipeline to make their decision. And
contrarily to the SourceTypeAction that requires to resolve the source 2
times (once for pull, once for push), the pipeline engine can
transparently buffer the StAX events consumed by matchers and selectors
to replay them in the next pipeline component.
Using pull pipelines doesn't mean we have to trash everything.
Converting DOM to/from StAX is straightforward, and so is StAX->SAX. The
SAX->StAX conversion is less easy and requires either buffering or a
separate thread.
Using pull pipelines also has an interesting side effects on
aggregations, as they can easily be inlined by pulling events
successively from partial pipelines (i.e. without a serializer), e.g:
<map:aggregate element="root">
<map:part>
<map:generate src="header.xml"/>
<map:transform src="header2html.xsl"/>
</map:part>
<map:part src="content/{1}.xml"/>
</map:aggregate>
<map:transform src="layout.xsl"/>
<map:serialize/>
Actually, writing
<map:part src="foo"/>
is equivalent to writing
<map:part>
<map:generate type="file" src="foo"/>
</map:part>
-- oOo --
Dynamic pipelines
-----------------
Yes, you read it well: dynamic pipelines. This is what comes next
naturally after content-aware pipelines: with use cases like webservices
and ESBs, the content-based routing is not enough and we also need
controller-driven routing.
For simpler cases, we already have cocoon.processPipelineTo() (and the
more versatile PipelineUtils class), but having to call the sitemap and
invent a private URL just to perform a transformation is really overkill.
I'd like to be able to write the following in a flowscript:
var pipeline = flow.newPipeline("non-caching");
pipeline.setGenerator("stream");
pipeline.addTransformer("xslt", "normalize.xsl");
if (cocoon.matchers.xpath("/foo/[EMAIL PROTECTED] = '" +
cocoon.session.attributes.bar_id + "']", pipeline)) {
handleBar(pipeline);
} else {
wrongId();
}
What we can see above is that we don't even need a serializer for the
pipeline to be useful, as we can pull events from it as soon as it has a
generator. And that generator could well be another pipeline built
somewhere else.
Basically, the pipeline engine becomes a very general-purpose object
that can be used not only in the sitemap (to build views), but also in
the controller for content-driven business logic decisions.
This programmatic building of pipelines can also be used by Cocoon
components themselves to implement some built-in transformations, e.g.
converting an XMLSchema to a CForms definition, without requiring to
copy/paste the corresponding sitemap instructions in user sitemaps, of
requiring to call a system-defined sitemap. IMO, the lack of reusable
system pipelines is one of the reasons why there hasn't been many
off-the-shelf products or applications built on top of Cocoon.
Being able to directly use pipelines can also ease the integration of
Cocoon as a transformation engine in other environments, such as an
advanced message transformer in the ServiceMix ESB[8].
-- oOo --
Controller-driven responses
---------------------------
The advent of Ajax applications leads to a radical change in web
applications architectures. There are many requests that don't lead to
producing a view, but sending data and/or control information. Having to
call a pipeline for this is really useless and overkill, as we don't
need any kind of processing.
We therefore need the controller to be able to directly send a
non-processed response. We already have an example of this in the Ajax
stuff for CForms[9] to send a simple <bu:continue> when form interaction
is finished and a full page reload is needed. Another example is data
transmission with an Ajax client using JSON[10].
So we need additional "sendxxx" methods in the controller: sendText(),
sendObject(), sendBytes() and why not sendStream().
Ajax applications also require aggregations defined at the controller
level. Let's consider an Ajax shopping cart application: the page
displays the items catalogue and a sidebar with the current content of
the shopping cart. When the user browses the items, only the catalogue
area needs to be refreshed on the page. When he adds an item to the
cart, both areas need to be refreshed at once to show the updated cart.
The knowledge of what parts of the page need to be refreshed is in the
controller. A solution can be to call a pipeline that will generate an
XInclude that itself will call other pipelines, but that's smelly and
doesn't allow to give different view data to each of the pipelines.
To allow this, we need something like:
flow.sendMultiple(
["catalogue", { paginator: paginator }],
["cart-sidebar", { cart: cart }]
);
-- oOo --
Core components
---------------------
Moving to pull pipelines isn't the only important core change: we need
to move away from Avalon for good. Now what container will we use? We
don't care: Cocoon 3.0 will be written as POJOs, and will come with a
"default" container. Will it be Spring, Hivemind, Pico? I don't know. We
may even provide configurations for several containers, as does
XFire[xxxxx].
-- oOo --
Ok, thanks reading so far.
My impression is that with all these changes, Cocoon will be sexy again.
Add a bit of runtime analysis of databases and automatic generation of
CForms to the picture, and you have something that has the same
productivity as RoR, but in a J2EE environment. It also includes what I
learned when working on Ajax and the consequences it has on the overall
system architecture.
You certainly have noticed that the above is more about the controller
than about the sitemap. This is because not much changes are needed
there, except content-aware matchers and selectors. But a more featured
controller will allow to trash a great number of pipeline components
that were invented to circumvent controller limitations. The code base
will shrink.
There are also a number of simplifications that can be done by using
builtin conventions over configuration, but I'll write about this later.
Tell me your thoughts. Am I completely off-track, or do you also want to
build this great new thing?
Sylvain
[1] www.djangoproject.com/
[2] http://www.andrewsavory.com/blog/archives/000976.html
[3] http://www.djangoproject.com/documentation/url_dispatch/
[4] http://www.janino.net/
[5]
http://svn.apache.org/repos/asf/forrest/trunk/main/java/org/apache/forrest/sourcetype/SourceTypeAction.java
[6] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101554683923592&w=2
[7] http://stax.codehaus.org/
[8] http://servicemix.codehaus.org/
[9]
http://svn.apache.org/repos/asf/cocoon/blocks/forms/trunk/java/org/apache/cocoon/forms/flow/javascript/Form.js
[10] http://bluxte.net/blog/2005-11/17-49-57.html
[11] http://xfire.codehaus.org/
--
Sylvain Wallez Anyware Technologies
http://bluxte.net http://www.anyware-tech.com
Apache Software Foundation Member Research & Technology Director