Here's a RT about Unico's proposal of "flattening" the sitemap for the migration to Fortress. Please read carefully, this has a lot of implications.
Introduction
------------
Today isn't worked in France. We "celebrate" (should we enjoy of that?) the end or Word War I, and this is the occasion to explain children what their grand-grand-fathers went through a century ago, hoping this won't happen again. I was doing some DIY at home, and manual work freezes my brain. So while digging in the garden, I was thinking of Unico's "iconoclast" proposal about the sitemap engine. Yes, the treeprocessor is still somehow "my baby", and seeing it shaked as it is these days makes me think a lot about it.
And then came the sudden revelation: Unico's idea is brilliant and its implications go far beyond the migration to Fortress.
Implications
------------
Considering every sitemap statement as a component allows to very easily implement a number of features that are either were wanted for long but were never implemented because of their complexity, or that will be needed for blocks:
1/ Virtual components
Virtual components are sitemap snippets that can be used in place of "regular" components. I many languages, these are called "macros". With sitemap statements as components, virtual components are a breeze to implement: just lookup the component, and see if what's returned is a regular sitemap component (e.g. a Serializer) or if it's a ProcessingNode. If it's a regular sitemap component, add it to the pipeline, and otherwise invoke the ProcessingNode.
What I'm not sure about here, is if its possible (or even desirable) that we can have two different implementation interfaces for a single role.
2/ Resources inheritance
Resources are nothing more than untyped virtual components (yeah Stefano, I know, they should be serializers). So if a resource isn't defined in a sitemap, we go up to the parent sitemap's component manager and lookup the resource there.
3/ Block-defined sitemap components
A block can provide sitemap (and other) components to other blocks, including virtual components. Nothing special here actually, but the fact that block inheritance is implemented, once again, by the parent relationship of component managers.
3/ View inheritance
Views are nothing more than virtual serializers, with the main difference that their hint is defined at runtime by the "cocoon-view" parameter. And since these are components, lookup goes up to the parent sitemap if a view is not declared in a given sitemap, thus providing inheritance.
Side note: relative URIs
------------------------
The various considerations about inheritance above leads to the question of resolution of relative source URI (Carsten raised this issue some time ago): what is the base URI that should be used by the resolver?
My opinion is that the base URI should be the one of the sitemap _handling_ the request. This means that "jumping" to another sitemap through virtual components or view inheritance should not affect the base URI.
However, there are many situations where we want to use a source relative to the _current_ sitemap regardless on how it's called. For this, I propose a new protocol similar to how "context:" behaves with the root sitemap, but for non-root sitemaps. The "sitemap:" protocol comes to mind, but I'm not sure this is a good name.
Performance considerations
--------------------------
When writing the TreeProcessor, great care was taken to pre-analyse everything that is possible to achieve maximum runtime speed. I currently found only two performance degradation points with this new approach:
- it's not possible to choose the ProcessingNode implementation depending on the class of a component as, e.g. in MatchNodeBuilder. The cost is finally just an "instanceof" check to choose the right behaviour.
- mapping from view names to their labels is pre-computed in the TreeProcessor for each individual sitemap component, so that the view's ProcessingNode (if any) can be found directly with the view name (see SitemapLanguage.getViewsForStatement and e.g.GenerateNode.invoke()). But, considering that views are marginally used in a production environment, the few extra lookups can be considered as negligible.
Implementation
--------------
The implementation mainly consists in merging the code of ProcessingNodeBuilder classes in the corresponding ProcessingNode class.
The initial "flattening" transformation can be implemented in XSL, whose simplicity will allow to implement at this level some semantic checks that can be difficult to implement otherwise.
However, an important requirement is to keep location information of sitemap statements. For this I suggest to augment the sitemap SAX stream by adding Locator information in a "location" attribute on every element. This augmentation can be useful in several other contexts such as Woody (would avoid the dependency on Xerces in DomUil.LocationTrackingDOMParser). This way, the initial location information can survive any kind of transformation.
From a security and abuse point of view, I'm wondering if all sitemap statement components should be made visible to other components through the container. If we don't want this, the sitemap engine could consist of two component managers, one containing the "public" statements such as views, resources, virtual components and the contents of <map:component>, and a child "private" manager containing other sitemap statements. This may also allow the public container to be less loaded and therefore faster.
Conclusion
----------
This new approach seems to have very few drawbacks (hope I did not miss something important), and will lead to a dramatic simplification of the sitemap engine. The most noticeable one being that the number of classes will be divided by 2.
There's only one implication on Cocoon's core: the ProcessingNode interface is now a public contract between processors, since this is what all these components implement.
The only criticism (yes, there need to be some ;-) is that I took great care in the TreeProcessor to separate build-time code and run-time code, while the ComponentizedProcessor will merge them in a single class. This allows all build-time data structures to be garbage collected, since we will never need them again. I also had the secret hope to be able to serialize the processing tree, in order to be able to use a pre-built tree on small devices (remember, I run Cocoon in small places), but this proved to be difficult if not impossible because components have a lot of relations with non-serializable objects.
I'm wondering if we should write this new sitemap engine in the 2.2 branch or if it should go in the 2.1. Fortress isn't a requirement to implement this, and it will allow us to provide views and resource inheritance before the 2.2 is out.
And I also think we should consider this approach when migrating Woody to CocoonForms, since Woody uses the same mechanism than the TreeProcessor to build a widget definition trees.
Thanks again Unico for this brillant idea.
What do you think, folks?
Sylvain
-- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
