Re: [RT] Comparing Woody & XMLForm : towards a unified form handling(long)

Marc Portier Tue, 22 Jul 2003 02:53:25 -0700

Sylvain et all,


We were patiently expecting that once it would be that:
Sylvain Wallez wrote:

Hi all,

first of all: great work and gentle introduction, many thx

Lately, I've been thinking a lot about form handling in Cocoon. The reason for this is that I will very soon start a project which is basically a large set of forms (about 40 different screens used to fill an XML document containing collections having up to 1000 or 2000 items). As part of our proposal for the project, I did some prototyping with XMLForm (+flowscript) and liked its lightweight markup and the strong separation it enforces between form definition and form layout. But I disliked its poor syntactical validation facilities. On the other side, we have Woody which is very good a validating data but which I find heavy to use and defines its own schema language. So this RT is my attempt to make a synthesis of the good and bad points of both frameworks, augmented with my own ideas, so that we can move towards a single unified form handling package in Cocoon.

here here!

it is not going to be an easy task for sure... the *one size fits all* -goal is not always reachable:

indeed: on the one side a framework like this should be _useful_ (in the sense of being complete and fast to use) but: on the other hand should not be limiting anyone to _only_ the envisioned possibilities of its creators... (that is always the case, but I guess you understand what I'm trying to say)

the 80-20 rule will hopefully guide us (since that one is often creating the itches we want to scratch)

Disclaimer : I don't want to start a war between Woody and XMLForm, but just try to analyze what we have today and expose what I (hence it's subjective) consider as good. Discussion is of course welcomed. Also, I may have missed some features of one or the other framework. In that case, please don't shoot at me, but be kind enough to explain what I missed !

will do, and same position here, I can learn some from jxforms, and Woody can benefit from your proposed enhancements in the process

Also, I'll speak about XMLForm, even if it's somewhat dead and replaced by JXForms (essentially a cleaner rewriting of the XMLFormTransformer and an update of the markup to the latest XForms draft), because all criticisms about XMLForm below come from the original XMLForm and not the JXForms work.

---oOo---
General overview
----------------
Both Woody and XMLForm use the same basic principles :

1/ Content production : a form template is "instanciated", i.e. it is filled with values coming from a data model, and the instanciated form is transformed to the target language (e.g. HTML) using generic and/or custom stylesheets that know how to render the various widgets.

2/ Form validation : upon form submission, values are validated and stored into a data model, and violations are produced if some validation error occurs (validations involving several fields are also possible). In case of error, the form can be redisplayed with the violations.

But, as we will see below, the notions of form template, data model and validation are very different in Woody and in XMLForm.

---oOo---
Form definition
---------------
Woody separates form definition, form template and form instance (3 different namespaces). The form definition is a kind of schema language

the new form binding introduces a 4th namespace but the good news is that the form-instance tags are not to be developer-written (hm, but the xsl converting it into whatever is dealing with it)

that defines every widget in the form with its label, datatype and validation constraints. The template contains references to form fields mixed with foreign markup (such as HTML). It is instanciated using the WoodyTransformer : every field present in the template is replaced by the corresponding instance acccording to the form definition.

yes.

one small remark: you could choose not to use the template. in that case one uses the WoodyGenerator which will produce the FULL XML representation of the form-instance at the start of your pipe (foreign markup is then typically decorated on the stream by any wild mix of xslt, xinclude, ...)

but I get your point:

if I got this correct then the big advantage of xmlforms you stress here is that the 'template' defines the _model_ while woody explicitely has a separate file for the latter, correct?

in fact woody introduced the template approach to be able to skip the XSLT requirement, so opting for no-template will leave you in probably an even worse spot...

Woody has no notion of application model, as it stores field values in it's own data structure, which must be read and written to the application model. Work is underway in this area with a JXPath based binding.

yep

XMLForm has only one markup, inspired by the W3C's XForms specification. This markup is more or less equivalent to the Woody template (it accepts foreign markup), which is instanciated ("augmented" would be better) with either the XMLFormTransformer/JXFormsTransformer or the JXFormsGenerator. Form fields contain XPath references to the data model, which can therefore have an arbitrary complexity.

<my-opinion> XMLForm is way easier to setup to produce forms : a single file, a data model containing any mixture of objects handled by JXPath (JavaBeans, DOM elements, etc), XPath expressions everywhere, and you're done. But as soon as there's a need for data whose formatting is more than toString(), such as dates and float values, and even more in an I18Nized environment, XMLForm shows strong limitations, mainly related to lack of proper formatting functions in XPath.

As JXPath supports extension functions, building a library of formatting functions can be a solution to circumvent XPath's reduced function set. But we'll see below that there's still a problem with parsing submitted form data.

Woody, on the other hand, is more complicated to set up, as two files are needed (form definition and form template), with many cross-references (field IDs). But Woody shines for complicated formatting (see <convertor> directives) and I18N.

IMO, Woody's separation of concerns between form definition and template is not that good. Woody would be easier to use if the definition file

every identification of a concern indeed creates a new responsibility to be taken up... it has been the classic approach to have those different responsibilties be expressed through different files/namespaces (since multiple people could/should be involved, thus allowing to map responsibilities onto person-attached files)

in this case this is leading us to 'woody requires a lot of configuration' (your remark on this is not new and the upcoming binding is likely to make it worse)

so any suggestions on sensibly cutting some of the config trouble makes all the sense in the world.

the full process of getting anywhere has 2 aspects IMHO: 1/ identify all the concerns and separate wisely 2/ recombine sensibly in 'assmebled' typical usages that lower the 80-20 itch for specific use cases... (in dream mode: generate the different config needs from a single source that might be as wild as JDBC metadata information?)

Woody up to now had some stress on 1/ (and I think we're not even there) it sure makes sense to start considering 2/ if we want to increase its 'usability'

was only a schema defining datatypes and if fields were defined only in the template. Although there is a great probability that datatypes can

mmm, I might be missing what you are saying here...

looking at the woody-definition file what I see is exactly identifying data-types, but then I'm talking about composite types rather then only single-field types (e.g. the composite 'person' type vs. his 'birthday'-date-field)

so what I might be reading wrongly here is that you would like to see the set of woody-form-definition files to evolve into some 'datatype-catalogue' ?

hopefully that would still include the composite types it focusses on now, and just provides for a reuse mechanism of forms-subforms and predefined 'field-typs'

be reused for different fields and even different forms, I'm not sure using the same fields within different templates really make sense. For example, HTML and WML browsers have so much different screen sizes and interaction constraints that a single form definition can hardly be used for both.

the envisioned use case behind the current split is exactly this: if we consider the HTML and WML front end versions of the same use cases then most likely the templates will need to change, but the model could remain the same, no?

e.g. in the case of WML you'ld probably split the complete editing of the one form-model over a wizard-managed series of templates... (and/or you would choose not to show the optional fields of the model)

to be hoped for is that both cases would reuse: - label/help/hotkey info (i18n) - validation rules on the complete form-model - the logic loading and submitting the complete filled-in 'form-model' back to whatever back-end

other examples would include: - deploying the web application in an ASP model where the different tennant-companies only provide their templates (leaving out optional fields, splitting large models in different ways,...) and reuse the same back-end logic

- not use any template at all, but just expose a ReST-like URL pattern people can send requests to, and receive an XML back?

I'ld have to admit these are not classic use cases but hardly to be overlooked by a modern form framework if you ask me... again I think the goal of being widely usable probably starts by separating the different concerns in the core of the thing and then carefully combining some of them back again in very specific/targetted ways of deployment? (I see current cocoon success as proof of that pudding)

Reusing datatypes for different fields would also increase the overall application consistency : as of today, if two fields have the same

agree, dataype-catalogue and form-subform recursiveness for composite types could achieve this

I still see the split from the template as having a different use.

so by using all these words to be intellectually correct I just forget to stress that I very much share your fear for the 'heaviness'

How to get a very practical way to setup a one file config approach for simple stuff :-( without compromising the wider usage and flexibility?

suggestions welcome...

datatype and constraints, these must be duplicated. This could also open the door to other schema languages (WXS, RNG, etc). </my-opinion>

---oOo---
Population and validation
-------------------------
"Population" is the term used to designate the action of "filling" the data model with form-submitted data. "Validation" is the action of controlling that submitted data is valid, i.e. that is satisfies some syntactic and semantic constraints.

Upon form submission, XMLForm traverses all request parameters and tries to set their value on the data model using JXPath. A feature allows to filter request parameters that are not part of the data model. If the data model was filled correctly, a validation is performed using Schematron. This allows to have finer-grained or inter-field controls, again using XPath expressions. Each of these two phases can produce violations, which are recorded in the Form object.

Upon form submission, Woody traverses the form's widget tree, and each widget is responsible to parse the corresponding request parameter and validate it's value. Non-visual widgets are also provided to perform inter-field controls.

<my-opinion> Here again, XMLForm is very easy to use but shows some strong limitations : because it's designed after XForms, XMLForm has no feature to specify how to parse form parameters (strings) into strongly typed data. So even basic parsing of e.g. dates is not possible, and locale-dependent parsing is clearly not possible.

The Schematron validation has less restrictions since it deals with the populated data model, and thus on strongly typed data, if they could be parsed in the population phase.

XMLForm also has what I consider a strong security weakness : the default request parameter filter rejects only special parameters such as "cocoon-action-*", which means that a request can be hacked that modifies a part of the data model that wasn't available as a form field. Considering that programmers are lazy (as I am), the form model will often be the actual business object. The consequences of providing a form to a user to update her location information can be catastrophic if the User class contains "address", "phoneNumber", but also "accessRights"...

W3C XForms, which inspired XMLForm, is a client-side specification targeted at producing XML documents validated by a WXS (W3C XML Schema). But XMLForm is server-side, and doesn't enforce any particular schema language. This means that very few features of XForms are actually used except the form markup and that all has to be invented to produce a featured server-side form framework, particularily in this population & validation phase.

Woody, by traversing the widget tree that was used to produce the form, doesn't have the security weakness of XMLForm since only parameters present in the produced form are considered. Also, it's strong parsing and I18N features make custom formatting really easy.

But, being limited to the form's data model, complex validations involving form data and application data can be difficult to do with Woody and will need custom Java code.

yep, touching the 80-20 rule again validation can become just really arbitrary complex so an escape to provided java-code is not to be prevented IMHO

how much of the stronger validation will be available under declarative form is then a matter of time, itches and scratches

Finally, Woody uses its own expression language, with IMO is not a good choice if we consider that "standard" expression languages such as Jexl exist and are already used in other Cocoon blocks.


no real opinion here,
consistency with other stuff does make sense fo course

I might be wrong but the simple expression syntax is more a means for reusing by assembling smaller parts of ready Java code

the alternative would be that you'ld need to write a Java class that does that assembly and then the definition file would list the full qualified classname of that new beast.

I don't know jexl enough to evaluate it for this usage... your advise and hints are welcomed

</my-opinion>

---oOo---
Mapping to the application data model
-------------------------------------
A form is useless if its content cannot be mapped in some way to the application data model.

XMLForm has no special provision for mapping form data to application data, but using JXPath makes it easy to fill any JavaBean or any DOM structure. Post-validation application behaviour can be added to either a subclass of AbstractXMLFormAction or in a flowscript.

Woody currently does not provide anything to map form data to application data and all this must be coded either in a subclass of AbstractWoodyAction or in a flowscript. But there's work underway to add binding features to Woody, the first incarnation being based on JXPath.

<my-opinion> XMLForm makes it easy (as pointed out above) for the lazy programmer to set the application data as the form model : mapping is then immediate and totally transparent. But along with the security problem mentioned above, this also means that when a form population & validation fails, it is very likely that some fields already have been modified, potentially leaving the data model in an inconsistent state.

So the secure and clean solution is to use a form-specific data model (a JavaBean, DynaBean or XML DOM), but this requires then custom code to copy form data to the application data model, thus loosing the simplicity provided by JXPath.

The ongoing work on Woody binding potentially allows a great range of target data models : the current JXPath binding will make it easy to map form data to an abitrary data structure, without XMLForm's limitations since parsed and strongly typed data will be stored in the application model. But we can also imagine other declarative bindings targetted at e.g. relational databases (no intermediate bean), EJBs, etc.

yes.

haven't looked into jxpath deep enough (just used it now, didn't get into the internals yet) but my current feeling would be to cater for these other backend-models by writing a specific JXPathContext wrapper... as such the effort would be reusable more widely?

</my-opinion>

---oOo---
I18N
----
I18N features should be separated in two main areas : - I18Nization of form labels and item values (i.e. combobox labels) - I18Nization of textbox inputs, such as floating point numbers, dates, etc.

For the first item, both XMLForm and Woody accept any foreign markup in widget labels, including <i18n:*> tags for use with the I18NTransformer. Woody lacks the equivalent to <xf:help> but this was recently discussed and should be added soon. XMLForm also allows labels and similar items to have their content fetched from the form model using a "ref" attribute. In that case, however, only characters are produced, and not mixed content.

For the second item (i18nization of inputs), XMLForm has no support, as it hardly supports custom formats, as explained previously. Woody, on the other hand, has strong support for i18nization of inputs through its <convertor> tag that supports locale-specific patterns for formatting and parsing.

<my-opinion> XMLForm's strong limitations for values formatting also apply to the i18n domain, whereas Woody not only provides strong support for value formatting, but also strong support for locale-dependend formatting.

XMLForm's "ref" attribute on form labels allows messages to be part of the form model, and thus be dynamic, but I'm not sure this is of real use. And if it is, Woody may be able to provide an equivalent through nested tags in the <wd:label> element. </my-opinion>

---oOo---
Conclusion
----------
XMLForm has a lot of success because it has filled a giant need in Cocoon applications to handle forms. Moreover, it fits nicely with flowscript, and this combination builds an easy to use solution for form handling. But using it in more and more complex use cases show some strong limitations that are largely related to its desire to mimic XForms. And I'm not sure these limitations can be removed without diverging largely from the XForms approach.

These limitations were obviously taken into account early in Woody's design, which make it stronger at handling data formatting and enforcing semantic constraints. But Woody, by over-separating concerns, is more heavy to use.

Considering all the pros and cons, I think Woody, which is still in its infancy, is more promising on the long term and should be promoted, once

mmm, lets make that * puberty (flirting around with different ideas, allowing oneselve to think the wrong ones even :-))

which means we still have to get through * adolescence (forming the real identity, gradually taking up responsibility towards early adopters) and * maturity (living its life, being used) before we reach the stage of * aged wisdom (where we have removed everything there was to remove, dear Antoine)

IMHO a number of good discussions and try out code could get us fastly passed the first two stages

featured enough, as the preferred form handling package in Cocoon.

---oOo---
Proposals
---------
We've seen that Woody requires to separate form definition from form template. I think (Bruno, correct me if I'm wrong) this constraint comes from the fact that the form _is_ the model, and thus must be filled with data _before_ being processed by the form template.

yes,

The ongoing work on form binding considers binding as a process surrounding form population and validation : the application->form binding fills an existing form, and the form->application binding transfers form data to the application model once the form is correctly validated.

Now we can imagine to have a "live" application->form binding occuring at form definition time which could allow simultaneous building of the form definition and population of form data from the binding. This

above sounds like allowing 'default' values in the form definition

combined with the fact that the default values themselves would be collected at form-instantiation time?

care to elaborate how you saw this happen?

feature could remove the need for a separate form definition and could be implemented by a WoodyTemplateGenerator taking as input a template file containing field definitions. A kind of "definition by example" (like the QBE that exists in Excel and various database systems).

This "defining-template" would only define fields and not datatypes. These datatypes could be either inferred from the application model trough the binding or fetched from a separate schema file (the current form definition, with only datatypes definitions).

On the other hand, form->application binding cannot be live, since we must ensure that all submitted value are valid before modifying the application data.

IIUC you're giving a shot at redistributing the information from the three current config files (form-defition, form-template and form-binding) into two reshaped ones: form-template and widget-datatype-catalogue?

maybe an example of how things would look like would give us more to chew on?

---oOo---

Thanks for reading so far. As I expect this post to generate lots of discussions, I suggest to create separate threads for particular subjects (particularily the final "proposals" chapter) in order to keep the discussion focused.

didn't do at this moment yet... if everything remains related I prefer one combining thread then multiple separates (but that's just me, I'll be happy to go with the flow and see some separate [woody proposal] threads pop up)

Sylvain


-marc=
--
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
[EMAIL PROTECTED]                              [EMAIL PROTECTED]

Re: [RT] Comparing Woody & XMLForm : towards a unified form handling(long)

Reply via email to