The current discussion about Cocoon database connection and some frustration about the complexity in connecting Woody based webapps to XML makes me think that it is time to take a new discussion about the principles for Cocoon input handling (see [1] for an earlier discussion).

But before discussing input handling I would like to remind a litle bit about why Cocoon is so great for output handling (publishing).


Why Cocoon rocks for publishing -------------------------------

Cocoon is based on three great ideas: XML-adaptors, XML-pipelines and the sitemap. Here we will discuss the first two.

If you have N different input formats and M output formats you need N*M converers for converting from every input format to every output format. This complexity can be reduced to N+M by finding a standard format (e.g. XML) and perform the convertion in two steps: first from input to the standard format and in a second step from the standard format to the output format. In Cocoon generators take care of tthe first step and serializers about the second step. If we only have a few input and output formats the extra complexity of the two step process is probably not worthwhile. But as we add formats it becomes increasily painfull to add N convertors for each new output format and M new convertors for each new input format.

Having a common format (XML) also makes it worthwhile to write tools that use that format booth as input and output (e.g. XSLT), and we can use the pipes and filter pattern to build complex transformations in terms of smaller specialized, reusable filters.


Dataflow in (web)apps ---------------------

Now, what about (web)-applications in Cocoon? Here the general pattern is: get input on some format from the user and store it in some format that the business logic can use. When the business logic has done its things, the ordinary (web)-publishing mechanism, i.e. a pipeline, can be used for showing the result.

So looking on the data flow a (simplified) view on publishing is:

[Input format -> Output format]

and for (web)apps:

[Input format (user) -> Output format (storage)] -> webapp -> [Input format (storage) -> Output format]

As we can see publishing has one conversion step and (web)apps has two. In [1] I talked about input and output pipelines for the two conversion steps.

Comparing input and output pipelines, the input handling have one main source of extra complexity: we cannot trust user input. We need to check that the input is correct and take different action dependent on that, so as a consequence control structure becomes more complicated when we have user input. A further reason for detailed control of user input is that while the output tend go from strongly typed data (db:s, Java etc) to loosely typed data; in presentation most things are strings. Input tend to have the opposite requirement, from strings to typed data.


Is Cocoon that great for input handling? ----------------------------------------

We can see that for input we need three things: more sofisticated control - this is solved with flowscripts. A mechanism for describing and validating the form of the input data. And possibly a mechanism to add type information to input data.

How is this handled in Cocoon?

In the begining there where only one input format: request parameters, i.e. a hashmap. It can be checked by a FormValidatorAction and stored in a db by a [Modular]DatabaseAction or used in Java code by writting a specialized action.

Now, unordered and unstructured input data like a hashmap, is not enough for more advanced user interfaces. [XML|JX]Forms and later Woody intoduced going from path like request parameter names to data structues. In [XML|JX]Forms by writing/reading directly to DOM or Java bean structures and in Woody by introduce a form model: the widget hierarchy.

In Woody the structure and data types, among other things, in the widget hierarchy is defined in a widget definition file. Woody also contains a validation mechanism working on the widget hierarchy and bidirectional conversion between the widget hierarchy and Java datastructures and DOM respectively.

Besides using request parameters and "structured" request parameters as user input. XML is used for WebDAV and web service applications, XML are also becoming more common from more advanced user clients. And with new environments like mail, CLI, JMS and possibly more, we will get even more user input formats. As storage formats we have various database types, file system, DOM etc.

We see that the situation for input handling have become quite similar to that for output: many input formats and many output formats. But in contrast to the output scenario we have no common design patterns for handling the complexity. In some cases we have components that converts directly from input format to storage format. In other cases we use a format between input and storage, but this format can be a hashmap, java beans, the Woody widget hierarchy or XML in form of DOM or SAX. In some of the cases we also have validation mechanisms for the middle format.

This lack of a common accepted pattern for input handling leads to: less reuse, multiple components that does similar things and a lack of a common focal point. An example of this is the discussion about Cocoon/relational database coupling: we have multiple ways to go from RDBs to XML, but no components for the opposite direction, we have actions that go in booth directions between hashmaps and RDBs and for going in booth directions between Java datastructures and RDBs.


The solution ;) ---------------

IMO we have an obvious solution to this situation rigth before our eyes: adapt the patterns that we allready use for output handling, i.e. adaptors and pipelines, to input handling as well. To do this we must decide about a common format. The candidates are: hashmaps, Java beans, Woody widget hierarchy and XML.

We have allready an action based framework for using hashmaps but it is questinable if unstructured data is enough for more advanced aplications. Java beans requires IMO to much work and it would also require all Cocoon users to be Java programers. The remaining candidates; Woody widget hierarchies and XML have a lot in common. Both are hierarchial data structures. Both contains (or can contain) typed data, (an XML document togeher with a schema is a typed datastructure).

While the Woody widget structure has some things in its advantage: we allready have working validation in it and easy connection to Java data types, I think that using XML has _huge_ advantages:

* Cocoon is an XML based framework and use XML as internal format allmost everywhere. When one use the Woody widget hierarchy one have to translate back and forth between XML and Woody all the time which as least IMO is a waist of time.

* XML is standardized, and there are an enormous amount of tools that use it. For Woody widgets, we have to do everything ourselves.

* There are well designed schemas for XML: XML Schema, and if you don't like that: Relax-NG. As the rest of the XML world use XML data types we get an impedance mismatch between the Woody data types and XML.


What does this mean in practice? --------------------------------

This far I have, (fairly strongly I supose ;) ), sugested that we should use XML as the standardized internal format for all input handling in Cocoon, so that we can use the adaptor and pipes and filter patterns for input as well as ouput. What does this means in practice?

To a part we allready have the mechanisms, e.g. one can use a pipeline that process the input from a processPipelineTo[DOM|SAX|Stream] within flowscripts. The pipeline input can be from request params or the inputStream (in the servlet enviroment) from "module:inputStream" and adapt it to XML in any generator.

But in many cases using SAX based XML as in pipelines is not enough we need a data structure i.e. DOM. This leads to flowscript components that reads some input format to DOM and from DOM to some output format or some store. We also will need flowscript components that go from DOM to DOM.

Untyped XML is not enough, so we also need typed XML. Here I consider a DOM with a schema atached to it, so that one can [re]validate the DOM, ask the nodes and the leaves if they are valid and what datatype they have and also access valid leaves in terms of the corresponding Java data type. I think something like this should be possible to build by combining a DOM implementation, e.g. Xerces, with Sun Multi Schema Validator (MSV) and XSDLIB [2].

CForms should IMO use the above described typed DOM as form model instead of the current propitary Java structure.

To make DOM easy to use within flowscripts it would be nice to write Rhino binding code (scriptable object) so that one can use the Ecma script API for DOM. It is also a good idea to use a DOM implementation that implements DOM events, so that one can write flowscript code in the same style as client side JS.

--- o0o ---

To sumarize: I think that we could make Cocoon considerably easier to use for (web)apps and increase reuse of components by using the XML-adaptor and pipes and filter pattern for input as well.

WDYT?

/Daniel

References
----------

[1] [RT] Input Pipelines (long)
http://marc.theaimsgroup.com/?t=104008605100003&r=1&w=2

[2] MSV
https://msv.dev.java.net/




Reply via email to