[DAISY] Updated: Cocoon Sitemap internals

daisy Thu, 06 Oct 2005 02:07:50 -0700

A document has been updated:

http://cocoon.zones.apache.org/daisy/documentation/732.html


Document ID: 732
Branch: main
Language: default
Name: Cocoon Sitemap internals (unchanged)
Document Type: Document (unchanged)
Updated on: 10/6/05 9:03:37 AM
Updated by: Helma van der Linden

A new version has been created, state: publish

Parts
=====
Content
-------
This part has been updated.
Mime type: text/xml (unchanged)
File name:  (unchanged)
Size: 15623 bytes (previous version: 3528 bytes)
Content diff:
(5 equal lines skipped)
    - links to apidocs of classes mentioned</p>
    
    <p>This information pertains to both Cocoon 2.1 and Cocoon 2.2, although the
--- classnames mentioned are Cocoon 2.2.</p>
+++ classnames mentioned are Cocoon 2.2. The process will be described in 
general
+++ first. This allows users to get a general grasp of how the process works. 
After
+++ that the process is described in more detail including classnames.</p>
    
    <h1>Sitemap processing</h1>
    
(30 equal lines skipped)
    request. This results in a single path through the tree with only a 
generator,
    transformers and a serializer as components.</p>
    
--- <h2>Phase 3: Executing the pipeline</h2>
+++ <h2>Phase 3: Execute the pipeline</h2>
    
    <p>Once the pipeline is built in the previous phase, its execution is 
invoked by
    calling generator.generate().</p>
(37 equal lines skipped)
    
    <p><img src="daisy:737"/></p>
    
+++ <h1>In-depth explanation</h1>
+++ 
+++ <p>We will now go over the process again but this time in more detail.</p>
+++ 
+++ <h2>Phase 1: Build the sitemap tree</h2>
+++ 
+++ <p>The TreeProcessor is set to get the Processor role in the cocoon.roles 
file.
+++ <br/>
+++ During the configuration of the TreeProcessor an ExtendedComponentSelector
+++ (builderSelector) is set up using the configuration file
+++ "treeprocessor-builtins.xml".</p>
+++ 
+++ <p>&gt; <br/>
+++ &gt; While calling TreeProcessor.process(environment), i.e. the method that
+++ <br/>
+++ &gt; takes the environment, applies the sitemap on it and produces the  
output,
+++ <br/>
+++ &gt; the following things happen:<br/>
+++ &gt; <br/>
+++ &gt; * The method setupRootNode is called (if necesary) and the<br/>
+++ &gt; builderSelector is used to get a TreeBuilder (builder). The build 
method
+++ <br/>
+++ &gt; on the builder is called with the sitemap as argument and a tree of 
<br/>
+++ &gt; ProcessingNodes corresponding to the sitemap is returned.<br/>
+++ &gt; <br/>
+++ &gt; * The sitemap is then executed by calling the invoke method for the 
root
+++ <br/>
+++ &gt; node.<br/>
+++ &gt; <br/>
+++ &gt; Building the tree<br/>
+++ &gt; -----------------<br/>
+++ &gt; <br/>
+++ &gt; In Cocoon using "treeprocessor-builtins.xml" SitemapLanguage that  
extends
+++ <br/>
+++ &gt; DefaultTreeBuilder is used as TreeBuilder. Within the<br/>
+++ &gt; DefaultTreeBuilder (during execution of the build method) a RoleManager
+++ <br/>
+++ &gt; is set up based on the "roles" section of "treeprocessor-builtins.xml"
+++ <br/>
+++ &gt; and a ExtendedComponentSelector is set up based on the "nodes" section.
+++ <br/>
+++ &gt; The "nodes" section associates the sitemap concepts to the appropriate
+++ <br/>
+++ &gt; ProcessingNodeBuilders. It also configures a ProcessingNodeBuilder so
+++ <br/>
+++ &gt; that it knows what type of children it is allowed to have and which 
ones
+++ <br/>
+++ &gt; that are forbidden.<br/>
+++ &gt; <br/>
+++ &gt; The build process starts (in the method createTree) by creating the 
<br/>
+++ &gt; ProcessingNodeBuilder (rootBuilder) that corresponds to the root 
element
+++ <br/>
+++ &gt; in the sitemap, associate the rootBuilder to the current TreeBuilder 
and
+++ <br/>
+++ &gt; call the rootBuilder.buildNode method with the configuration tree  
created
+++ <br/>
+++ &gt; from the sitemap.<br/>
+++ &gt; <br/>
+++ &gt; The FooNodeBuilder.buildNode method creates and returns a FooNode 
object
+++ <br/>
+++ &gt; and recursevly creates the child nodes of the object by creating 
and<br/>
+++ &gt; executing the corresponding builder objects.<br/>
+++ &gt; <br/>
+++ &gt; Executing the tree<br/>
+++ &gt; ------------------<br/>
+++ &gt; <br/>
+++ &gt; While (recursevly) executing the invoke(environment, context) method 
for
+++ <br/>
+++ &gt; the node objects in the tree a Pipeline object is constructed that is
+++ <br/>
+++ &gt; stored in the context object (other things happens as well). When a 
<br/>
+++ &gt; SerializeNode is invoked, the current Pipeline is proccesed and the 
<br/>
+++ &gt; output is stored in the environment.<br/>
+++ &gt; <br/>
+++ &gt; ----------------------------------<br/>
+++ &gt; <br/>
+++ &gt; &lt;sidenote&gt;<br/>
+++ &gt; I builded a Cocoon inspired signal processing framework about a year 
ago
+++ <br/>
+++ &gt; and tried to reuse Sylvain's framework. While most of it is very<br/>
+++ &gt; general, there are some Cocoon specific details in the Context and 
<br/>
+++ &gt; Environment interfaces, so I ended up in building something similar but
+++ <br/>
+++ &gt; simpler instead.<br/>
+++ &gt; &lt;/sidenote&gt;<br/>
+++ &gt; <br/>
+++ &gt; HTH<br/>
+++ &gt; <br/>
+++ &gt; /Daniel<br/>
+++ &gt; <br/>
+++ &gt;</p>
+++ 
+++ <p>Nice explanation, Daniel! I'm happy to see that other people understand
+++ <br/>
+++ this.</p>
+++ 
+++ <p>However, I'd like to add some background to this to explain why it does
+++ <br/>
+++ work this way, some additional details and what we could eventually <br/>
+++ refactor to ease the migration to Fortress.</p>
+++ 
+++ <p>I started the TreeProcessor for two reasons.</p>
+++ 
+++ <p>The first reason was that the sitemap engine at that time was compiled 
<br/>
+++ into a Java class like XSP. But the sitemap logicsheet was very complex 
<br/>
+++ and recompiling a large sitemap took ages (more than 20 seconds on the <br/>
+++ samples sitemap), leading to painful try/fail cycles. We needed <br/>
+++ something faster.</p>
+++ 
+++ <p>The second reason was that at that time (autumn 2001), a number of RTs 
<br/>
+++ were written related to what we called "flowmaps" and later led to <br/>
+++ flowscript. These RTs were describing new ways to build a pipeline to <br/>
+++ take flow into account, but no real code was written to test these <br/>
+++ ideas, because deeply changing the way the sitemap code was generated <br/>
+++ was very painful: finding its way into the 2000-lines XSLT was not easy.</p>
+++ 
+++ <p>So I decided to consider another approach, based on an evaluation tree 
<br/>
+++ (hence TreeProcessor), each node in the tree corresponding to a xxxmap <br/>
+++ instruction (sitemap or flowmap).</p>
+++ 
+++ <p>An additional motivation for me was that it would require me to heavily
+++ <br/>
+++ use the Avalon concepts and therefore increase my knowledge in this <br/>
+++ area. This was mostly written at home, and my wife deserves many thanks, 
<br/>
+++ because this thing took my brain day and night for more than 2 months 
;-)</p>
+++ 
+++ <p>The main idea of the TreeProcessor is that each kind of instruction <br/>
+++ (e.g. &lt;map:act&gt;, &lt;map:generate&gt;, etc) is described by two 
classes :
+++ <br/>
+++ - a ProcessingNode, the runtime object that will execute the 
instruction,<br/>
+++ - a ProcessingNodeBuilder, responsible for creating the ProcessingNode <br/>
+++ with the appropriate data and/or childnodes, extracted from attributes, 
<br/>
+++ child elements, etc.</p>
+++ 
+++ <p>Implementing the sitemap language then translates into writing the <br/>
+++ appropriate ProcessingNodeBuilder classes for all statements of the <br/>
+++ language. But since we were discussing flowmaps and other pipeline <br/>
+++ construction approaches, I wanted this to be easily extensible, and even 
<br/>
+++ allow the simultaneous use of different languages in the system <br/>
+++ (sitemap/flowmap). This is why &lt;map:mount&gt; supports an additional 
<br/>
+++ undocumented and never used "language" attribute (see MountNodeBuilder)</p>
+++ 
+++ <p>So the TreeProcessor configuration contains the definition of <br/>
+++ TreeBuilder implementations for various "languages", the sitemap being <br/>
+++ the only one we have today. The whole configuration document is actually 
<br/>
+++ a ComponentSelector for TreeBuilder implementations. The SitemapLanguage 
<br/>
+++ class is the implementation of TreeBuilder for the sitemap language. A <br/>
+++ TreeBuilder builds a processing node tree based on a file (e.g. <br/>
+++ sitemap.xmap) that is read in an Avalon configuration (this was chosen <br/>
+++ for its ease of use compared to raw DOM).</p>
+++ 
+++ <p>&lt;fortress-migration&gt;<br/>
+++ Obviously, this initial selector can be removed and the sitemap language 
<br/>
+++ be the only one available, as we now have the flowscript and it's very <br/>
+++ unlikely that we will redesign a new pipeline language in the near (or <br/>
+++ even distant) future.<br/>
+++ &lt;/fortress-migration&gt;</p>
+++ 
+++ <p>Roles, selectors and &lt;map:components&gt;<br/>
+++ -------------------------------------</p>
+++ 
+++ <p>The &lt;map:components&gt; section of a sitemap is used to configure a 
<br/>
+++ ComponentManager (child of either the parent sitemap's manager or the <br/>
+++ main manager), and the &lt;roles&gt; section of the TreeProcessor <br/>
+++ configuration defines a RoleSelector that is used by this manager. For <br/>
+++ the sitemap, it defines the shorthands that will map &lt;map:generators&gt;,
+++ <br/>
+++ &lt;map:selectors&gt;, etc, to a special "ComponentsSelector" (yeah, the 
name
+++ <br/>
+++ could be better).</p>
+++ 
+++ <p>This ComponentsSelector handles the &lt;map:components&gt; syntax ("src" 
and
+++ <br/>
+++ not "class", etc), and holds the "default" attribute, view labels and <br/>
+++ mime types for each hint (these are not know by the components 
themselves).</p>
+++ 
+++ <p>&lt;fortress-migration&gt;<br/>
+++ AFAIU, Fortress allows defaults for a collection of components <br/>
+++ implementing the same role, but I don't know how we can handle the <br/>
+++ additional "label" and "mime-type", which are not handled by the <br/>
+++ component itself.</p>
+++ 
+++ <p>Can we imagine a "fake" selector that route calls to select() to the 
<br/>
+++ manager and handle these additional information on its own?<br/>
+++ &lt;/fortress-migration&gt;</p>
+++ 
+++ <p>Building the processing tree<br/>
+++ ----------------------------</p>
+++ 
+++ <p>The second section in a language configuration, &lt;nodes&gt;, defines a
+++ <br/>
+++ ComponentSelector for ProcessingNodeBuilders. For each element <br/>
+++ encountered in the sitemap source file, the corresponding node builder <br/>
+++ is fetched from this selector with the local name of the element as the 
<br/>
+++ selection hint, i.e. &lt;map:act&gt; will lead to 
selector.select("act").</p>
+++ 
+++ <p>The contents of each &lt;node&gt; element is the specific Avalon
+++ configuration <br/>
+++ of the corresponding ProcessingNodeBuilder and mostly define the allowed 
<br/>
+++ child statements.</p>
+++ 
+++ <p>Now a sitemap is not a tree, but a graph because of resources and views
+++ <br/>
+++ that can be called from any point in the sitemap. To handle this, <br/>
+++ building the processing tree follows two phases:<br/>
+++ - the whole node tree is built, and nodes that other nodes can link (or 
<br/>
+++ jump) to are registered in the common TreeBuilder by their respective <br/>
+++ node builders (see TreeBuilder.registerNode()).<br/>
+++ - then then those node builders that implement <br/>
+++ LikedProcessingNodeBuilder are asked link their node, which they do by <br/>
+++ fetching the appropriate node registered in the first phase.</p>
+++ 
+++ <p>We then obtain an evaluation tree (in reality a graph) that is ready for
+++ <br/>
+++ use. All build-time related components are then released.</p>
+++ 
+++ <p>It is to be noted also, that a ProcessingNode is considered as a <br/>
+++ "non-managed component": with the help of the LifecycleHelper class, the 
<br/>
+++ TreeBuilder honours any of the Avalon lifecycle interfaces that a node <br/>
+++ implements. This is required as many nodes require access to the <br/>
+++ component selectors defined by &lt;map:components&gt;. Disposable nodes are
+++ <br/>
+++ collected in a list that the TreeProcessor traverses when needed <br/>
+++ (sitemap change or system disposal).</p>
+++ 
+++ <p>Great care has been taken to cleanly separate build-time and run-time 
<br/>
+++ code and data, to ensure the smallest memory occupation and the fastest 
<br/>
+++ possible execution. This led this intepreted engine to be a bit faster <br/>
+++ at runtime than the compiled one (build time is more than 20 times 
faster).</p>
+++ 
+++ <p>&lt;fortress-migration&gt;<br/>
+++ An optimisation that is done and may be relevant to migration to <br/>
+++ Fortress is that ThreadSafe components are looked up as part of the tree 
<br/>
+++ building and never looked up again later (see e.g. MatchNode). AFAIU, <br/>
+++ lifestyle interface no more exist with Fortress, so this optimisation <br/>
+++ may be difficult to do, if not impossible.<br/>
+++ &lt;/fortress-migration&gt;</p>
+++ 
+++ <p>Building a pipeline<br/>
+++ -------------------</p>
+++ 
+++ <p>When a request has to be processed, the TreeProcessor calls invoke() on
+++ <br/>
+++ the root node of the evaluation tree. This method has two parameters: <br/>
+++ the environment defining the request, and an InvokeContext that mainly <br/>
+++ holds the pipeline that is being built and the stack of sitemap 
variables.</p>
+++ 
+++ <p>The invoke method executes all processing nodes (depth first) until one
+++ <br/>
+++ them returns "true", meaning that a pipeline was successfully built. <br/>
+++ Examples of nodes that return true are serializers, readers and 
redirect.</p>
+++ 
+++ <p>If the environment is external, the pipeline is executed as soon as it 
<br/>
+++ is ended (i.e. in the reader or serializer node). But if the environment 
<br/>
+++ is internal (i.e. a "cocoon:" source), it is not, meaning the pipeline <br/>
+++ is returned to the SitemapSource, ready for later execution if requested 
<br/>
+++ so (e.g. by a Source.getInputStream()).</p>
+++ 
+++ <p>Phew... I finally explained the whole thing in depth. I'm no more the 
<br/>
+++ only one to know ;-)<br/>
+++ I'll also put this into the wiki.</p>
+++ 
    </body>
    </html>


Fields
======
no changes

Links
=====
no changes

Custom Fields
=============
no changes

Collections
===========
no changes

[DAISY] Updated: Cocoon Sitemap internals

Reply via email to