TreeProcessor is a complicated beast, so examining the classes does not lend any clues to what is going on. However, the key to understanding TreeProcessor is the treeprocessor-builtins.xml file.
We have an XML document with the following DTD:
<!DOCTYPE tree-processor [
<!ELEMENT tree-processor (language+)>
<!ELEMENT language (namespace, file, parameter, roles, nodes)>
<!ATTLIST language
name CDATA #REQUIRED
class CDATA #REQUIRED
pool-min CDATA #IMPLIED
pool-max CDATA #IMPLIED
>
<!ELEMENT namespace EMPTY>
<!ATTLIST namespace uri CDATA #REQUIRED>
<!ELEMENT file EMPTY>
<!ATTLIST file name CDATA #REQUIRED>
<!ELEMENT parameter EMPTY>
<!ATTLIST parameter element CDATA #REQUIRED>
<!ELEMENT roles (role+)>
<!ELEMENT role (hint*)>
<!ATTLIST role
name CDATA #REQUIRED
shorthand CDATA #REQUIRED
default-class CDATA #REQUIRED
>
<!ELEMENT hint EMPTY>
<!ATTLIST hint
shorthand CDATA #REQUIRED
class CDATA #REQUIRED
>
<!ELEMENT nodes (node+)>
<!ELEMENT node (allowed-children*, ignored-children*, forbidden-children*)>
<!ATTLIST node
name CDATA #REQUIRED
builder CDATA #REQUIRED
>
<!ELEMENT allowed-children (#PCDATA)>
<!ELEMENT ignored-children (#PCDATA)>
<!ELEMENT forbidden-children (#PCDATA)>
]>So with a mock XML slimmed down to just the simplest state:
<tree-processor>
<language name="sitemap"
class="org.apache.cocoon.components.treeprocessor.sitemap.SitemapLanguage"
pool-min="1" pool-max="1"><namespace uri="http://apache.org/cocoon/sitemap/1.0"/> <file name="sitemap.xmap"/> <parameter element="parameter"/>
<!-- roles skipped because they are irrelevant -->
<nodes>
<node name="pipelines"
builder="org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNodeBuilder">
<allowed-children>pipeline, handle-errors</allowed-children>
<ignored-children>component-configurations</ignored-children>
<forbidden-children>sitemap, components, pipelines</forbidden-children>
</node>
</nodes>
</language>
</tree-processor>What is happening here is that we define a sitemap tree parser by first identifying how to recognize the sitemap: the namespace for the XML, the default file name, how to recognize the "parameter" element (special to TreeProcessor semantics). I skipped the roles definition because in Cocoon 2.2 it won't be needed. However, it describes the default types of components that the tree processor expects.
The Nodes section is the heart of the system. It maps XML elements to Builder objects which perform some sort of logic. The child elements "allowed-children", "ignored-children", and "forbidden-children" act as a "poor man's" DTD so to speak. At least they provide some explicit processing hints that augment a DTD. In the example above, the "pipeline" and "handle-errors" are child nodes that are explicitly allowed and handled from inside the "pipelines" node. The "component-configurations" node is allowed to exist as a child of the "pipelines" node, but no processing occurs. Lastly, the "forbidden-children" element identifies nodes that cannot exist as a child of the "pipelines" node.
All the enumerated elements (enumerated by a comma and any amount of whitespace) must be declared nodes so that they can be processed.
In theory, XSP pages *could* be implemented with the TreeBuilder, but in practice, you cannot predict the schemas used for elements other than the XSP specific ones. The TreeProcessor is best suited for fully encapsulated schemas that act as a sort of language like the Sitemap.
This at least is the base theory behind the TreeProcessor--as far as I can tell. Please let me know if I am missing it somewhere.
As to implementation, the TreeBuilder creates a heirarchy of ECM implementations that add any necessary components and Builder components. The particularly troublesome portion of this is the use of the Recomposeable interface.
The whole issue with the Recomposable interface as it is written here is that the child and parent component managers are constantly overwriting each other. THis is a serious conflict, and it will break as soon as we proxy components. The proxied components hide any lifecycle interfaces so that no rogue client can usurp the component manager, or any other part of the lifecycle of a component, and provide for a more stable system.
THe recomposable calls scare me because they look like something that would work under low load, but would break down under high load. With something like Cocoon that is a big issue. I don't have any numbers to show everyone, but it is just a feeling I get by looking at the code.
As to the nitty gritty details of how the node tree is built and run, I am still somewhat fuzzy on the details. I know we have a bunch of NodeBuilders, which instantiate the Nodes, which in turn are special components. The NodeBuilders can be viewed as a sort of intelligent object creator, but I am not sure whether the Nodes are components with relaxed requirements on the constructor, or if the Nodes are simple objects. Those Nodes are what does the hard work. Once the tree is built, the builders are not necessary any more (unless you want to keep building new trees).
I know I want to have a new Container per sitemap, but I think I need some help in mapping it to this problem space. Ovideu, do you think you could at least spare some guidance?
--
"They that give up essential liberty to obtain a little temporary safety
deserve neither liberty nor safety."
- Benjamin Franklin