Berin Loritsch wrote:
Stefano Mazzocchi wrote:
On Friday, Oct 17, 2003, at 20:36 Europe/Rome, Berin Loritsch wrote:I am slowly getting up to speed, and I will eventually get there. The important thing is to grok the big picture, which will help me with the details.
Anyone have a logical view of how the Treebuilder is supposed to work? It would definitely help me in refactoring things. As it is now, the Treebuilder isSylvain knows this and, AFAIK, he's one of the few (only?) that does. Which is also something that I'm particularely comfortable with, even if this is not clearly Sylvain's fault if what he writes works as expected and nobody has to go there and fix it ;-)
tightly integrated with the ECM, so it is something that won't work right away... I am just trying to get to a place where I can compile 2.2 so that my testcases will run and I can verify what I am doing works.
But a description of the internals of the tree processor would be helpful not only for migration for also for future reference (refactor? cleanup? profile? whatever)
Main steps ----------
The TreeProcessor is set to get the Processor role in the cocoon.roles file.
During the configuration of the TreeProcessor an ExtendedComponentSelector (builderSelector) is set up using the configuration file "treeprocessor-builtins.xml".
While calling TreeProcessor.process(environment), i.e. the method that takes the environment, applies the sitemap on it and produces the output,
the following things happen:
* The method setupRootNode is called (if necesary) and the
builderSelector is used to get a TreeBuilder (builder). The build method on the builder is called with the sitemap as argument and a tree of ProcessingNodes corresponding to the sitemap is returned.
* The sitemap is then executed by calling the invoke method for the root node.
Building the tree -----------------
In Cocoon using "treeprocessor-builtins.xml" SitemapLanguage that extends
DefaultTreeBuilder is used as TreeBuilder. Within the
DefaultTreeBuilder (during execution of the build method) a RoleManager is set up based on the "roles" section of "treeprocessor-builtins.xml" and a ExtendedComponentSelector is set up based on the "nodes" section. The "nodes" section associates the sitemap concepts to the appropriate ProcessingNodeBuilders. It also configures a ProcessingNodeBuilder so that it knows what type of children it is allowed to have and which ones that are forbidden.
The build process starts (in the method createTree) by creating the ProcessingNodeBuilder (rootBuilder) that corresponds to the root element in the sitemap, associate the rootBuilder to the current TreeBuilder and call the rootBuilder.buildNode method with the configuration tree created
from the sitemap.
The FooNodeBuilder.buildNode method creates and returns a FooNode object and recursevly creates the child nodes of the object by creating and executing the corresponding builder objects.
Executing the tree ------------------
While (recursevly) executing the invoke(environment, context) method for the node objects in the tree a Pipeline object is constructed that is stored in the context object (other things happens as well). When a SerializeNode is invoked, the current Pipeline is proccesed and the output is stored in the environment.
----------------------------------
<sidenote>
I builded a Cocoon inspired signal processing framework about a year ago and tried to reuse Sylvain's framework. While most of it is very
general, there are some Cocoon specific details in the Context and Environment interfaces, so I ended up in building something similar but simpler instead.
</sidenote>
HTH
/Daniel
Nice explanation, Daniel! I'm happy to see that other people understand this.
However, I'd like to add some background to this to explain why it does work this way, some additional details and what we could eventually refactor to ease the migration to Fortress.
I started the TreeProcessor for two reasons.
The first reason was that the sitemap engine at that time was compiled into a Java class like XSP. But the sitemap logicsheet was very complex and recompiling a large sitemap took ages (more than 20 seconds on the samples sitemap), leading to painful try/fail cycles. We needed something faster.
The second reason was that at that time (autumn 2001), a number of RTs were written related to what we called "flowmaps" and later led to flowscript. These RTs were describing new ways to build a pipeline to take flow into account, but no real code was written to test these ideas, because deeply changing the way the sitemap code was generated was very painful: finding its way into the 2000-lines XSLT was not easy.
So I decided to consider another approach, based on an evaluation tree (hence TreeProcessor), each node in the tree corresponding to a xxxmap instruction (sitemap or flowmap).
An additional motivation for me was that it would require me to heavily use the Avalon concepts and therefore increase my knowledge in this area. This was mostly written at home, and my wife deserves many thanks, because this thing took my brain day and night for more than 2 months ;-)
The main idea of the TreeProcessor is that each kind of instruction (e.g. <map:act>, <map:generate>, etc) is described by two classes :
- a ProcessingNode, the runtime object that will execute the instruction,
- a ProcessingNodeBuilder, responsible for creating the ProcessingNode with the appropriate data and/or childnodes, extracted from attributes, child elements, etc.
Implementing the sitemap language then translates into writing the appropriate ProcessingNodeBuilder classes for all statements of the language. But since we were discussing flowmaps and other pipeline construction approaches, I wanted this to be easily extensible, and even allow the simultaneous use of different languages in the system (sitemap/flowmap). This is why <map:mount> supports an additional undocumented and never used "language" attribute (see MountNodeBuilder)
So the TreeProcessor configuration contains the definition of TreeBuilder implementations for various "languages", the sitemap being the only one we have today. The whole configuration document is actually a ComponentSelector for TreeBuilder implementations. The SitemapLanguage class is the implementation of TreeBuilder for the sitemap language. A TreeBuilder builds a processing node tree based on a file (e.g. sitemap.xmap) that is read in an Avalon configuration (this was chosen for its ease of use compared to raw DOM).
<fortress-migration>
Obviously, this initial selector can be removed and the sitemap language be the only one available, as we now have the flowscript and it's very unlikely that we will redesign a new pipeline language in the near (or even distant) future.
</fortress-migration>
Roles, selectors and <map:components> -------------------------------------
The <map:components> section of a sitemap is used to configure a ComponentManager (child of either the parent sitemap's manager or the main manager), and the <roles> section of the TreeProcessor configuration defines a RoleSelector that is used by this manager. For the sitemap, it defines the shorthands that will map <map:generators>, <map:selectors>, etc, to a special "ComponentsSelector" (yeah, the name could be better).
This ComponentsSelector handles the <map:components> syntax ("src" and not "class", etc), and holds the "default" attribute, view labels and mime types for each hint (these are not know by the components themselves).
<fortress-migration>
AFAIU, Fortress allows defaults for a collection of components implementing the same role, but I don't know how we can handle the additional "label" and "mime-type", which are not handled by the component itself.
Can we imagine a "fake" selector that route calls to select() to the manager and handle these additional information on its own?
</fortress-migration>
Building the processing tree ----------------------------
The second section in a language configuration, <nodes>, defines a ComponentSelector for ProcessingNodeBuilders. For each element encountered in the sitemap source file, the corresponding node builder is fetched from this selector with the local name of the element as the selection hint, i.e. <map:act> will lead to selector.select("act").
The contents of each <node> element is the specific Avalon configuration of the corresponding ProcessingNodeBuilder and mostly define the allowed child statements.
Now a sitemap is not a tree, but a graph because of resources and views that can be called from any point in the sitemap. To handle this, building the processing tree follows two phases:
- the whole node tree is built, and nodes that other nodes can link (or jump) to are registered in the common TreeBuilder by their respective node builders (see TreeBuilder.registerNode()).
- then then those node builders that implement LikedProcessingNodeBuilder are asked link their node, which they do by fetching the appropriate node registered in the first phase.
We then obtain an evaluation tree (in reality a graph) that is ready for use. All build-time related components are then released.
It is to be noted also, that a ProcessingNode is considered as a "non-managed component": with the help of the LifecycleHelper class, the TreeBuilder honours any of the Avalon lifecycle interfaces that a node implements. This is required as many nodes require access to the component selectors defined by <map:components>. Disposable nodes are collected in a list that the TreeProcessor traverses when needed (sitemap change or system disposal).
Great care has been taken to cleanly separate build-time and run-time code and data, to ensure the smallest memory occupation and the fastest possible execution. This led this intepreted engine to be a bit faster at runtime than the compiled one (build time is more than 20 times faster).
<fortress-migration>
An optimisation that is done and may be relevant to migration to Fortress is that ThreadSafe components are looked up as part of the tree building and never looked up again later (see e.g. MatchNode). AFAIU, lifestyle interface no more exist with Fortress, so this optimisation may be difficult to do, if not impossible.
</fortress-migration>
Building a pipeline -------------------
When a request has to be processed, the TreeProcessor calls invoke() on the root node of the evaluation tree. This method has two parameters: the environment defining the request, and an InvokeContext that mainly holds the pipeline that is being built and the stack of sitemap variables.
The invoke method executes all processing nodes (depth first) until one them returns "true", meaning that a pipeline was successfully built. Examples of nodes that return true are serializers, readers and redirect.
If the environment is external, the pipeline is executed as soon as it is ended (i.e. in the reader or serializer node). But if the environment is internal (i.e. a "cocoon:" source), it is not, meaning the pipeline is returned to the SitemapSource, ready for later execution if requested so (e.g. by a Source.getInputStream()).
Phew... I finally explained the whole thing in depth. I'm no more the only one to know ;-)
I'll also put this into the wiki.
Sylvain
-- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
