Forrest (a.k.a. xml.apache.org 2.0)

Stefano Mazzocchi Mon, 17 Dec 2001 00:59:16 -0800

[sorry for cross-post: this is a general issue, but I'd like the cocoon
people to know what I'm doing so that they might give me a hand :)]


I started the effort that will, hopefully, bring us a much more useful
documentation system for xml.apache.org and, hopefully, to the entire
ASF, even if political and ego obstacles will get in the way.

I personally don't care: this effort is mainly to create a better
documentation infrastructure following the goals outlined below. I
started the Cocoon project three years ago exactly for this reason and
now that has all the features I needed, I think I can attack the problem
from a very wide angle. 

The site building system will be targetted toward xml.apache.org, but
I'll keep a very broad perspective, making it possible to adapt the
system to other apache.org projects with very few changes.

BIG DISCLAIMER: however, whether this happens or not, I personally don't
care. For sure, don't count on me wasting my time on fighting about 'my
DTD is better than yours' or 'my system is
faster/smaller/cleaner/easier-to-use/more-extensible than yours'.

I'll come up with a system that works and then you guys will vote on
what to do. I consider this an exercise to present full Cocoon
potentials (that, objectively, beat the pants out of all the other
systems used around Apache) but nothing more than this.

                                    - o -

Ok, now that I stated this, let's get into the effort goals.

GOALS
-----

1) Speed: current xml.apache.org is slow. Empirical studies on learning
processes indicate that if a page takes more than 10 seconds on a 56Kbs
modem, the cognitive experience is degrated.

2) Coherence: current xml.apache.org is extremely incoherent. Again,
it's easy to understand that lack of coherence between subprojects docs
is perceived (and sometimes reflects!) lack of cooperation.

3) Navigation: the navigation experience on current xml.apache.org is a
nightmare. There is no way to perceive the basic elements of spatial
navigation: where am I? where can I go? how do I go back? how do I go
there?

4) Depth: the current xml.apache.org page layout forces a flat hierarchy
of levels. The current Cocoon documentation somewhat extends this, but
the visual look doesn't reflect the notion. Visual codes are extremely
important to allow a easy and immediate navigation even at the deepest
level.

5) Usefulness: xml.apache.org contains powerful software but it's not
powerful in itself. It should be a window on the information useful for
both users and developers, along with friendly behavior, such as
print-friendly versions of the single pages and of the whole
per-subproject documentation, pagination of long articles,
site-restricted search, graphs of project-related data and so on.

6) Simplicity: xml.apache.org is done by volunteers, on all levels.
Nobody is directly paid to do this. Not even myself. So, if the above
goals are met, but the system is not simple and immediate to use for
those who have to maintain and update the information, the result is
void over a short period of time.

7) Extensibility, Flexibility, Modularity: web sites, just as software,
are living entities that adapt on their environment. The build system
must not restrict the ability to evolutionary extend the information
architecture.

8) URI Solidity and Future Compatibility: URIs are contracts between the
publisher and the user. Human users have the ability to estimate the
long-term validity of these contracts and 'route around' eventual broken
links, while machine users do not. The goal is to come up with a system
that allows to generate a web site with strong URIs.


Design Decisions
----------------

staticity: even if I think that the availability of a dynamic publishing
system would be beneficial, considering the web site load, the load of
the apache machines and the state of the JVM for FreeBSD and the
political problems behind all this, it's *must* easier (at least for
now) to have a static version of the site batch-produced and then placed
into the web-serving space.

automaticity: the site will be automatically generated out of files
stored into CVS. The idea is to have GUMP-like nagging features that
send email to the various mail lists using XML validation to estimate
the 'integrity' of the docs placed.

For this reason, in honor of Sam Ruby's great work, and for the
resonation with 'forest', thus a huge number of trees (i.e. XML files),
I call this effort "Forrest".

I believe that together, Forrest and Gump, will help bringing apache
quality one step up (moreover, as in the name, forrest wraps gump and
will publish its generated data, providing more overall coherence)

                                        - o -

separation of concerns
----------------------

There are three concern islands, here is a list of their duties.

subproject
==========

each subproject should provide:

3.a) a 'description' file that includes information on the codebase, its
description, its released versions, its CVS modules, its CVS tags, its
mail lists and its documentations (yes, a subproject might have more
than one, think of Xerces1/Xerces2, Xalan1/Xalan2, Cocoon1/Cocoon2).
[proposed filename: /description.xml]
 
3.b) a 'committers info' file that includes information on the
committers, along with a short bio, an email address and a picture of
them. [proposed filename: /committers.xml]

3.c) a 'change log' file that includes information on changes and
software relases [proposed filename: /changes.xml]

3.d) a 'todo list' file that includes the information on things to do
and who volunteered for doing it [proposed filename: /todo.xml]

3.e) a 'news' file that includes events and useful information that
should be made available to the general public.

then, for each documentation (location is get from the description
file):

3.f) a 'table of content' that indicates the hierarchical sequence of
the files and where to find them into the CVS repository (for each
documentation). This is kept as a single file to allow document writers
to maintain 'coherence' and visualize the entire part. This is
equivalent to the stylebook book.xml file but with full nesting
capabilities.

3.e) the pages that componse the documentation (their location is get
from the ToC file)

Log scanner
===========

The log scanner is a set of scripts that scan the logs from the CVS, the
mail lists and the web site to gather information on:

 1) mail list activity (subscribers and messages)
 2) web site activity (hits and downloads)
 3) CVS activity (general commits, commits per person)

This scanner provides this information in a simple format that can be
easily fed into the documentation building system.

Build system
============

The build system will:

1) aggregate, filter and otherwise adapt the information collected from
the various subprojects CVS modules, from the log scanner and from the
GUMP run into static HTML files (for the browser pages), static PDF
files (for print-friendly versions) and JPEG images (for graphs).

2) generate navigation information in all the pages

3) check validation of all the required XML files and send nag messages
to the mail lists if failure occurs.

4) generate httpd-related corollary files (.htaccess, header.html,
footer.html and so on).

5) upload the parts that didn't have failures online.

The goal is to have the system running completely autonomous: this
follows the Gump approach. [Sam, I'll need your help here, since I don't
have an account on nagoya]

                                        - o -

Things to decide
================

1) DTDs
-------

The Cocoon project already has DTDs for 'documentation','change
logs','todo list' and 'specifications'. They mainly use XHTML tags and
are very easy to learn (they are an expansion of the original stylebook
DTDs, so it's pretty easy to automatically adapt existing stylebook
documents to this improved DTD, still keeping the simplicity we had
before).

The rest of the required DTDs (description, news and ToC) must be agreed
upon (i'll work on them in the next days)

2) URIs
-------

In order to achieve the future-compatible goal, we must come up with a
guideline for URIs.

For example, the Cocoon project had /cocoon and /cocoon2, then Cocoon
2.0 was released final and we moved /cocoon2 into /cocoon and /cocoon
into /cocoon1, creating a shit-load of broken links.

Two solutions where proposed (add your own if you have more)

 a) use version specific information and use mod_rewrite to adapt. for
example

 xml.apache.org/cocoon/1.8.2/index.html
 xml.apache.org/cocoon/2.0b1/index.html
 xml.apache.org/cocoon/2.0b2/index.html
 xml.apache.org/cocoon/2.0rc1/index.html
 xml.apache.org/cocoon/2.0rc2/index.html
 xml.apache.org/cocoon/2.0/index.html

then

 xml.apache.org/cocoon/ -> xml.apache.org/cocoon/2.0/index.hml

Problem is that while those versioned URI are never broken, the
version-less redirected URI is changed for each release and doesn't
reduce broken links. Also, it's probably easier to download the required
version and look into the shipped docs and results in unnecessary big
web sites.

 b) use semantic-meaningful yet version-less URIs

  xml.apache.org/cocoon/previous/ -> points to the previous generation
docs
  xml.apache.org/cocoon/ -> points to the latest docs
  xml.apache.org/cocoon/next/ -> points to the next generation docs

which removes the need to have keep all the docs versions online, yet
provides the ability to have both versions the latest one and the
previous generation (for Cocoon would be Cocoon 1.8.2, Cocoon 2.0,
Cocoon 2.1-dev today). 

The problem of broken links isn't solved since everytime there is a
transition, there is a chance of breaking previously established links
if the docs ToC changes from one generation to the next.

3) layout
---------

The layout previously proposed on this list was a solution to the speed
problem but I couldn't adapt it to the depth needs identified in the
rest of the goals.

So, I resurrected my rusty web design skills and came up with the layout
you find attached. I've tested it on IE 5.5, NS 4.78 and Moz 0.9.5 on
win2k. 

Feedback, suggestions and criticisms are appreciated.

4) CVS location and mail list discussions
-----------------------------------------

Just like Gump which is not a subproject on its own, Forrest doesn't
deserve that status neither as long as it remains a single-man show (and
my experience tells me it will very likely remain so if the above goals
are met)

At the same time, just like Gump, it requires a CVS space.

Possible places are:

 1) xml-site
 2) xml-forrest
 3) xml-site2

for mail list discussions, solutions are:

 1) [EMAIL PROTECTED]
 2) [EMAIL PROTECTED]
 3) [EMAIL PROTECTED]
 4) [EMAIL PROTECTED]

Please, add your comments/suggestions and your votes where a decision is
required.

Thank you.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------

new-site.zip
Description: Zip compressed data

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Forrest (a.k.a. xml.apache.org 2.0)

Reply via email to