What does "XHTML2 as an internal document format" mean?

Ross Gardler Wed, 14 Sep 2005 16:56:36 -0700

In preparation for our upcoming IRC session on the topic of convertingForrest to use a subset of XHTML2 as its internal document format. Thereappear to be at least two, if not three (or even more) opinions on this.The purpose of this thread is not (at least initially) to debate eachopinion, but instead to provide background information to feed into theIRC session.

If you have a suggestion for an approach then please add it to thisthread. However, please avoid commenting on other proposals that havegone before (other than to say "as described by..." in cases of agreement).

The idea is for this to be an initial brainstorming thread *not* adiscussion or planning thread. We'll do that later, lets just absorb oneanothers ideas so we can extract the best of them all via IRCdiscussion. We can then come back to this thread and wrap up with ourconclusions.


--------

Here's an outline of my approach:

--------

Assumptions
===========

First of all I assume that there is no point in working on anything todo with the old skinning system. It is going to be removed in favour ofviews and I don't want to have to refactor things twice.

I am using forrest:views to define the various technologies that,together, provide the new skinning system. That is those items definedin [1]



Defining the Core Pipeline
==========================

The pipeline when using views is discussed in [1] where we define thepipeline to be either:


                                           theme
                                             |
                                            \|/
src -> input plugin -> core (views) -> output plugin -> output
                        |        /|\
                       \|/        |
                    forrest:contracts

As defined in [5] or:


                                           theme
                                             |
                                            \|/
src -> input plugin -> core (views) -> output plugin -> output
                       /|\      |          |
                        |      \|/        \|/
                        +------------------+
                        |forrest:contracts |
                        |forrest:properties|
                        +------------------+

This later pipeline was suggested because "the contracts as viewHelpershould come *from* the plugin" [2] (actually I reversed the last arrowfrom the original post because of this description)

[It should be noted that since these mails were written we have agreedto rename the part of forrest:views shown here in core as "structurer",I will use the term structurer in the rest of this mail]

Both of the above are aligned with our TR document [4] which defines thestages along the central pipeline as:


Resolver -> Xifier -> Filter -> Windower -> Themer -> Serializer

Cool, lots of agreement there :-)

Fitting Forrest:Views into the Pipeline
=======================================

So, we seem to be in agreement on the core pipeline. However, there areactually two opinions on how views fit in. I am going to really rock theboat and add a third (even though one of the above is mine ;-)

Why do we need a third? Lets start off by looking at the definitions ofthe various parts of this pipeline:


Structurer
----------

The structurer part of a view is defined as adding "a structure to thegenerated page that clearly identifies all the content in the finaloutput" [6] and [7], and further as "The structuring of the assembledpage where all content is in place and structured with forrest:hooks toprovide hooks for theming." [8]

OK, so it is pretty clear that the *.fv files are part of thestructurer. And these belong in core, that is the language used isdefined by Forrest core itself. It is an internal format. Note thismeans we can use, for example, the Cocoon Portal page layout language asan input format for the structurer, or we can generate it as an outputfrom the structurer.

Note that the structurer does *not* define any content. Therefore coreshould *not* have any knowledge of content


Forrest Contracts
-----------------

Forrest:contracts are defined as "the templated content that should beinserted into the final document. These may create a new request inorder to generate the content" [5] and as "Helpers (forrest:contracts)mainly adapt and transform the presentation model (pm) for the view, butalso help with any limited business processing that is initiated fromthe view (forrest:properties)" [8]

So contracts describe how to retrieve/extracts bits of content (ornuggets) to be inserted into the final document at locations defined inthe *.fv files (for the structurer).


Output Plugins
--------------

An output plugin is defined as providing "a new output format. Forexample, the s5 plugin extends Forrest to produce HTML slides fromForrest documents." [3]

So an output plugin provides a version of a document that can berendered, for example, HTML or FO. It may also provide a theme todescribe how this should be displayed in the final rendering, e.g. CSS(FO has no separate theme, but the plugin may provide configuration infofor the generated FO).

In my view there is nothing in this definition that describes *content*and since forrest:contracts are about content they have no place inoutput plugins.

However, they do have a place in input input plugins since they *do*define content. Some examples can be found in my recent work on theResume plugin where I have defined contracts to insert the variousportions of a resume into documents.


Finally, they fit!
------------------

So given the definitions/opinions above, I think the processingpipeline, with views plugged in is:


                                           theme
                                             |
                                            \|/
src -> input plugin -> core (views) -> output plugin -> output
 |          |              /|\                           /|\
 |          |               |                             |
 |          |         \ +------------------+              |
 |          +---------- |forrest:contracts |              |
 |                    / |forrest:properties|              |
 |                      +------------------+              |
 |                                                        |
 |                                                        |
 +--------------------------------------------------------+

Notice that *all* of our contracts are coming from input plugins. Why isthis? The answer will come clear in the next section (I hope).


XHTML2 in Core
==============

So finally we come to the point. What does it mean for XHTML2 to be ourinternal document format? First (not quite there yet) lets consider whywe have an internal format:

We want to convert many source formats into many output formats. We wantto do this with minimal effort. So we adopt an internal format and writea series of output plugins to give us the different formats from thatsingle internal format. Now we write a load of input plugins to convertthe source formats into our internal format and viola, we have many tomany conversion.

So, everything coming *in* to our core must be our internal format, andeverything coming *out* must be our internal format. There should be*nothing* inside core fo any other format.


An Example Input Plugin
-----------------------

It is the job of our input plugins to provide the internal format.Consder a OpenOffice input plugin, it converts the OOo XML format to ourinternal format. What forrest:contracts does it provide?

An OOo document consists of meta-data, content (made up of pages,sections, paragraphs) and style information. So logical contracts wouldbe various meta-data contracts (authors, statistics, abstract,keywords), content (all, page X etc.) and style (produces CSS). This waya user can decide which parts of the original document are used.


An Example Output Plugin
------------------------

It is the job of our output plugins to consume the internal format andproduce our output format. So they take a *fully structured* documentand convert it into the chosen output. Lets consider an HTML outputplugin. What does it provide?

It provides a single XSL that converts XHTML2 to HTML. It may alsoprovide an XSL to convert an internal style language into CSS (wecurrently do not have an internal style language, so lets not go therejust yet, just planting a meme).

What about a PDF output plugin? It provides a single XSL to convert fromXHTML2 to FO.


Concluding Where XHTML2 Fits
----------------------------

It fits in the forrest:contracts and in the internal processing withincore (structurer).


How do we Implement it?
=======================

Lets first consider what we have (in the XHTML2 plugin since this is theapproach I am outlining here):


- we have an XHTML2 based site

- we have the start of the XHTML to HTML stylesheet that will be themajor part of the HTML output plugin

- we have some templates converted to use XHTML2 - these will form thestart of an XHTML2 input plugin

- we have a structurer sitemap that is basically the two existing viewsplugins thrown together

Combined these elements will provide the content elements of a page.They do not currently work with navigation etc. since the aggregation ofnavigation has been removed since it belongs in the contracts not in thestructurer (as discussed above).



Roadmap
-------

Now what do we need to do?

- enable the navigation contracts

- convert all contracts to XHTML2

- break out the HTML output plugin

- add theming support

- break out the XHTML2 input plugin

- refactor (or rewrite?) the structurer sitemap (with locationmap in mind)

The Future
==========

This last step (refactor structurer sitemaps) is really part of a largereffort that to addess the first stage of our pipeline as defined above.That is the resolving of the source file.


I'll leave that for a whole new Forrest Tuesday.

References
==========

[1] http://marc.theaimsgroup.com/?t=112276643700001&r=1&w=2

[2] http://marc.theaimsgroup.com/?l=forrest-dev&m=112596689428172&w=2

[3]http://forrest.apache.org/pluginDocs/plugins_0_80/pluginInfrastructure.html#outputPlugins

[4]http://svn.apache.org/viewcvs.cgi/*checkout*/forrest/trunk/site-author/content/xdocs/TR/2005/WD-forrest10.html


[5] http://marc.theaimsgroup.com/?l=forrest-dev&m=112276632331269&w=2

[6] http://marc.theaimsgroup.com/?l=forrest-dev&m=112277657832032&w=2

[7] http://marc.theaimsgroup.com/?l=forrest-dev&m=112438965225785&w=2

[8] http://marc.theaimsgroup.com/?l=forrest-dev&m=112596689428172&w=2

What does "XHTML2 as an internal document format" mean?

Reply via email to