TreeProcessor is a complicated beast, so examining the classes does not lend any clues to what is going on. However, the key to understanding TreeProcessor is the treeprocessor-builtins.xml file.
?? Haven't you seen my explanation to your previous request?
See http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=106649931022515&w=2
We have an XML document with the following DTD:
<snip/>
So with a mock XML slimmed down to just the simplest state:
<tree-processor>
<language name="sitemap"
class="org.apache.cocoon.components.treeprocessor.sitemap.SitemapLanguage"
pool-min="1" pool-max="1">
<namespace uri="http://apache.org/cocoon/sitemap/1.0"/> <file name="sitemap.xmap"/> <parameter element="parameter"/>
<!-- roles skipped because they are irrelevant -->
<nodes>
<node name="pipelines" builder="org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNodeBuilder">
<allowed-children>pipeline, handle-errors</allowed-children>
<ignored-children>component-configurations</ignored-children>
<forbidden-children>sitemap, components, pipelines</forbidden-children>
</node>
</nodes>
</language>
</tree-processor>
What is happening here is that we define a sitemap tree parser by first identifying how to recognize the sitemap: the namespace for the XML, the default file name, how to recognize the "parameter" element (special to TreeProcessor semantics). I skipped the roles definition because in Cocoon 2.2 it won't be needed. However, it describes the default types of components that the tree processor expects.
Correct.
The Nodes section is the heart of the system. It maps XML elements to Builder objects which perform some sort of logic. The child elements "allowed-children", "ignored-children", and "forbidden-children" act as a "poor man's" DTD so to speak. At least they provide some explicit processing hints that augment a DTD. In the example above, the "pipeline" and "handle-errors" are child nodes that are explicitly allowed and handled from inside the "pipelines" node. The "component-configurations" node is allowed to exist as a child of the "pipelines" node, but no processing occurs. Lastly, the "forbidden-children" element identifies nodes that cannot exist as a child of the "pipelines" node.
Correct.
All the enumerated elements (enumerated by a comma and any amount of whitespace) must be declared nodes so that they can be processed.
In theory, XSP pages *could* be implemented with the TreeBuilder, but in practice, you cannot predict the schemas used for elements other than the XSP specific ones. The TreeProcessor is best suited for fully encapsulated schemas that act as a sort of language like the Sitemap.
XSP also has the particularity of allowing embedded java code, meaning it requires the production of java code and thus cannot be implemented with a tree-evaluation based approach.
This at least is the base theory behind the TreeProcessor--as far as I can tell. Please let me know if I am missing it somewhere.
You're totally right!
As to implementation, the TreeBuilder creates a heirarchy of ECM implementations that add any necessary components and Builder components. The particularly troublesome portion of this is the use of the Recomposeable interface.
The whole issue with the Recomposable interface as it is written here is that the child and parent component managers are constantly overwriting each other. THis is a serious conflict, and it will break as soon as we proxy components. The proxied components hide any lifecycle interfaces so that no rogue client can usurp the component manager, or any other part of the lifecycle of a component, and provide for a more stable system.
The Recomposable interface is used here so that node builders know the component manager of the tree that is being built, because this is where the builders should lookup components when they need some.
I admit this is not clean, as it mixes the container which manages the node builders (built with the treeprocessor-builtins.xml file) and the container in which the tree that is being built has to live.
A solution can be to add a getTreeManager() method to the TreeBuilder interface, that would return the manager for the tree being built (i.e. the one defined by <map:components>).
How does it sound?
THe recomposable calls scare me because they look like something that would work under low load, but would break down under high load. With something like Cocoon that is a big issue. I don't have any numbers to show everyone, but it is just a feeling I get by looking at the code.
You should not wonder, since this is used only to _build_ the sitemap, i.e. at startup or when the sitemap file is changed.
As to the nitty gritty details of how the node tree is built and run, I am still somewhat fuzzy on the details. I know we have a bunch of NodeBuilders, which instantiate the Nodes, which in turn are special components. The NodeBuilders can be viewed as a sort of intelligent object creator, but I am not sure whether the Nodes are components with relaxed requirements on the constructor, or if the Nodes are simple objects. Those Nodes are what does the hard work. Once the tree is built, the builders are not necessary any more (unless you want to keep building new trees).
Please refer to my previous post mentioned above. Your analysis is right, and Nodes are inbetween components and simple objects: the DefaultTreeBuilder.setupNode() method will honor any lifecycle interface implemented by a node, and if the node implements Disposable, it is added to a list that is used to dispose a processing tree when needed (system shutdown or sitemap reload).
I came to this NodeBuilder/Node pattern since Nodes need to behave like components but cannot be declared as such, as the configuration of a given node type (e.g. GenerateNode) highly depends on its environment (i.e. the corresponding markup in the sitemap file). Moreover, a single NodeBuilder implementation can produce nodes of different classes, also depending on the environment (see ActNodeBuilder or CallNodeBuilder).
I know I want to have a new Container per sitemap, but I think I need some help in mapping it to this problem space. Ovideu, do you think you could at least spare some guidance?
Ahem... I guess Ovidiu won't isn't the right person for this stuff, but I hope my explanation will help ;-)
Sylvain
-- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
