[FYI] How TreeProcessor Works

Berin Loritsch Fri, 24 Oct 2003 06:51:59 -0700

TreeProcessor is a complicated beast, so examining the classes does not
lend any clues to what is going on.  However, the key to understanding
TreeProcessor is the treeprocessor-builtins.xml file.

We have an XML document with the following DTD:

<!DOCTYPE tree-processor [
  <!ELEMENT tree-processor (language+)>
  <!ELEMENT language (namespace, file, parameter, roles, nodes)>
  <!ATTLIST language
    name CDATA #REQUIRED
    class CDATA #REQUIRED
    pool-min CDATA #IMPLIED
    pool-max CDATA #IMPLIED
  >
  <!ELEMENT namespace EMPTY>
  <!ATTLIST namespace uri CDATA #REQUIRED>
  <!ELEMENT file EMPTY>
  <!ATTLIST file name CDATA #REQUIRED>
  <!ELEMENT parameter EMPTY>
  <!ATTLIST parameter element CDATA #REQUIRED>
  <!ELEMENT roles (role+)>
  <!ELEMENT role (hint*)>
  <!ATTLIST role
    name CDATA #REQUIRED
    shorthand CDATA #REQUIRED
    default-class CDATA #REQUIRED
  >
  <!ELEMENT hint EMPTY>
  <!ATTLIST hint
    shorthand CDATA #REQUIRED
    class CDATA #REQUIRED
  >
  <!ELEMENT nodes (node+)>
  <!ELEMENT node (allowed-children*, ignored-children*, forbidden-children*)>
  <!ATTLIST node
    name CDATA #REQUIRED
    builder CDATA #REQUIRED
  >
  <!ELEMENT allowed-children (#PCDATA)>
  <!ELEMENT ignored-children (#PCDATA)>
  <!ELEMENT forbidden-children (#PCDATA)>
]>

So with a mock XML slimmed down to just the simplest state:

<tree-processor>
  <language name="sitemap"
      class="org.apache.cocoon.components.treeprocessor.sitemap.SitemapLanguage"
      pool-min="1" pool-max="1">

    <namespace uri="http://apache.org/cocoon/sitemap/1.0"/>
    <file name="sitemap.xmap"/>
    <parameter element="parameter"/>

    <nodes>
      <node name="pipelines"
builder="org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNodeBuilder">
        <allowed-children>pipeline, handle-errors</allowed-children>
        <ignored-children>component-configurations</ignored-children>
        <forbidden-children>sitemap, components, pipelines</forbidden-children>
      </node>
    </nodes>
  </language>
</tree-processor>

What is happening here is that we define a sitemap tree parser by first
identifying how to recognize the sitemap: the namespace for the XML,
the default file name, how to recognize the "parameter" element (special
to TreeProcessor semantics).  I skipped the roles definition because in
Cocoon 2.2 it won't be needed.  However, it describes the default types
of components that the tree processor expects.

The Nodes section is the heart of the system.  It maps XML elements to
Builder objects which perform some sort of logic.  The child elements
"allowed-children", "ignored-children", and "forbidden-children" act as
a "poor man's" DTD so to speak.  At least they provide some explicit
processing hints that augment a DTD.  In the example above, the
"pipeline" and "handle-errors" are child nodes that are explicitly
allowed and handled from inside the "pipelines" node.  The
"component-configurations" node is allowed to exist as a child of
the "pipelines" node, but no processing occurs.  Lastly, the
"forbidden-children" element identifies nodes that cannot exist as
a child of the "pipelines" node.

All the enumerated elements (enumerated by a comma and any amount of
whitespace) must be declared nodes so that they can be processed.

In theory, XSP pages *could* be implemented with the TreeBuilder, but
in practice, you cannot predict the schemas used for elements other
than the XSP specific ones.  The TreeProcessor is best suited for fully
encapsulated schemas that act as a sort of language like the Sitemap.

This at least is the base theory behind the TreeProcessor--as far as I can
tell.  Please let me know if I am missing it somewhere.

As to implementation, the TreeBuilder creates a heirarchy of ECM
implementations that add any necessary components and Builder components.
The particularly troublesome portion of this is the use of the Recomposeable
interface.

The whole issue with the Recomposable interface as it is written here is that
the child and parent component managers are constantly overwriting each other.
THis is a serious conflict, and it will break as soon as we proxy components.
The proxied components hide any lifecycle interfaces so that no rogue client
can usurp the component manager, or any other part of the lifecycle of a
component, and provide for a more stable system.

THe recomposable calls scare me because they look like something that would
work under low load, but would break down under high load.  With something
like Cocoon that is a big issue.  I don't have any numbers to show everyone,
but it is just a feeling I get by looking at the code.

As to the nitty gritty details of how the node tree is built and run, I am
still somewhat fuzzy on the details.  I know we have a bunch of NodeBuilders,
which instantiate the Nodes, which in turn are special components.  The
NodeBuilders can be viewed as a sort of intelligent object creator, but I
am not sure whether the Nodes are components with relaxed requirements on
the constructor, or if the Nodes are simple objects.  Those Nodes are what
does the hard work.  Once the tree is built, the builders are not necessary
any more (unless you want to keep building new trees).

I know I want to have a new Container per sitemap, but I think I need some
help in mapping it to this problem space.  Ovideu, do you think you could at
least spare some guidance?

--

"They that give up essential liberty to obtain a little temporary safety
 deserve neither liberty nor safety."
                - Benjamin Franklin

[FYI] How TreeProcessor Works

Reply via email to