Re: Forrest (a.k.a. xml.apache.org 2.0)

giacomo Tue, 18 Dec 2001 12:26:36 -0800

On Sun, 16 Dec 2001, Stefano Mazzocchi wrote:

I have some experience in using cocoon to generate html sites and pdf
books with full validation turned on. So I might be of help in this
concern (yes, cocoon is able to do that out of the box with support for
CATALOG mappings to overcome the hassle of SystemID pointing into the
file system instead of using  http URLs).


Another point is (yes, guys, it comes again) shouldn't we move to a
more "official" DTD for our docs? The Avalon project already has parts
of their documents written in DocBook as well as XSLT stylesheets to

        1. transform DocBook to html and pdf (addmittedly only a subset
           of DocBook)
        2. convert the cocoon DTDs you've mentioned somewhere below
           to the DocBook DTD (for legacy reasons :).

In a documentation build system based on cocoon I'm using I've choosen
an evolutionary approach to support various DocBook elements by adding a
XSLT template like this:

 <xsl:template match="node()" priority=".-1">
   <xsl:message>
     THE ELEMENT <xsl:value-of select="name(.)"> ISN'T YET SUPPORTED
   </xsl:message>
   <xsl:copy>
     <xsl:apply-templates/>
   <xsl:copy>
 </xsl:template>

 <xsl:template match="@*" priority="-1">
   <xsl:copy>
     <xsl:apply-templates/>
   <xsl:copy>
 </xsl:template>

These templates lead to show during the build process each element used
in the xdocs without explicit support made available.

Giacomo

> [sorry for cross-post: this is a general issue, but I'd like the cocoon
> people to know what I'm doing so that they might give me a hand :)]
>
> I started the effort that will, hopefully, bring us a much more useful
> documentation system for xml.apache.org and, hopefully, to the entire
> ASF, even if political and ego obstacles will get in the way.
>
> I personally don't care: this effort is mainly to create a better
> documentation infrastructure following the goals outlined below. I
> started the Cocoon project three years ago exactly for this reason and
> now that has all the features I needed, I think I can attack the problem
> from a very wide angle.
>
> The site building system will be targetted toward xml.apache.org, but
> I'll keep a very broad perspective, making it possible to adapt the
> system to other apache.org projects with very few changes.
>
> BIG DISCLAIMER: however, whether this happens or not, I personally don't
> care. For sure, don't count on me wasting my time on fighting about 'my
> DTD is better than yours' or 'my system is
> faster/smaller/cleaner/easier-to-use/more-extensible than yours'.
>
> I'll come up with a system that works and then you guys will vote on
> what to do. I consider this an exercise to present full Cocoon
> potentials (that, objectively, beat the pants out of all the other
> systems used around Apache) but nothing more than this.
>
>                                     - o -
>
> Ok, now that I stated this, let's get into the effort goals.
>
> GOALS
> -----
>
> 1) Speed: current xml.apache.org is slow. Empirical studies on learning
> processes indicate that if a page takes more than 10 seconds on a 56Kbs
> modem, the cognitive experience is degrated.
>
> 2) Coherence: current xml.apache.org is extremely incoherent. Again,
> it's easy to understand that lack of coherence between subprojects docs
> is perceived (and sometimes reflects!) lack of cooperation.
>
> 3) Navigation: the navigation experience on current xml.apache.org is a
> nightmare. There is no way to perceive the basic elements of spatial
> navigation: where am I? where can I go? how do I go back? how do I go
> there?
>
> 4) Depth: the current xml.apache.org page layout forces a flat hierarchy
> of levels. The current Cocoon documentation somewhat extends this, but
> the visual look doesn't reflect the notion. Visual codes are extremely
> important to allow a easy and immediate navigation even at the deepest
> level.
>
> 5) Usefulness: xml.apache.org contains powerful software but it's not
> powerful in itself. It should be a window on the information useful for
> both users and developers, along with friendly behavior, such as
> print-friendly versions of the single pages and of the whole
> per-subproject documentation, pagination of long articles,
> site-restricted search, graphs of project-related data and so on.
>
> 6) Simplicity: xml.apache.org is done by volunteers, on all levels.
> Nobody is directly paid to do this. Not even myself. So, if the above
> goals are met, but the system is not simple and immediate to use for
> those who have to maintain and update the information, the result is
> void over a short period of time.
>
> 7) Extensibility, Flexibility, Modularity: web sites, just as software,
> are living entities that adapt on their environment. The build system
> must not restrict the ability to evolutionary extend the information
> architecture.
>
> 8) URI Solidity and Future Compatibility: URIs are contracts between the
> publisher and the user. Human users have the ability to estimate the
> long-term validity of these contracts and 'route around' eventual broken
> links, while machine users do not. The goal is to come up with a system
> that allows to generate a web site with strong URIs.
>
>
> Design Decisions
> ----------------
>
> staticity: even if I think that the availability of a dynamic publishing
> system would be beneficial, considering the web site load, the load of
> the apache machines and the state of the JVM for FreeBSD and the
> political problems behind all this, it's *must* easier (at least for
> now) to have a static version of the site batch-produced and then placed
> into the web-serving space.
>
> automaticity: the site will be automatically generated out of files
> stored into CVS. The idea is to have GUMP-like nagging features that
> send email to the various mail lists using XML validation to estimate
> the 'integrity' of the docs placed.
>
> For this reason, in honor of Sam Ruby's great work, and for the
> resonation with 'forest', thus a huge number of trees (i.e. XML files),
> I call this effort "Forrest".
>
> I believe that together, Forrest and Gump, will help bringing apache
> quality one step up (moreover, as in the name, forrest wraps gump and
> will publish its generated data, providing more overall coherence)
>
>                                         - o -
>
> separation of concerns
> ----------------------
>
> There are three concern islands, here is a list of their duties.
>
> subproject
> ==========
>
> each subproject should provide:
>
> 3.a) a 'description' file that includes information on the codebase, its
> description, its released versions, its CVS modules, its CVS tags, its
> mail lists and its documentations (yes, a subproject might have more
> than one, think of Xerces1/Xerces2, Xalan1/Xalan2, Cocoon1/Cocoon2).
> [proposed filename: /description.xml]
>
> 3.b) a 'committers info' file that includes information on the
> committers, along with a short bio, an email address and a picture of
> them. [proposed filename: /committers.xml]
>
> 3.c) a 'change log' file that includes information on changes and
> software relases [proposed filename: /changes.xml]
>
> 3.d) a 'todo list' file that includes the information on things to do
> and who volunteered for doing it [proposed filename: /todo.xml]
>
> 3.e) a 'news' file that includes events and useful information that
> should be made available to the general public.
>
> then, for each documentation (location is get from the description
> file):
>
> 3.f) a 'table of content' that indicates the hierarchical sequence of
> the files and where to find them into the CVS repository (for each
> documentation). This is kept as a single file to allow document writers
> to maintain 'coherence' and visualize the entire part. This is
> equivalent to the stylebook book.xml file but with full nesting
> capabilities.
>
> 3.e) the pages that componse the documentation (their location is get
> from the ToC file)
>
> Log scanner
> ===========
>
> The log scanner is a set of scripts that scan the logs from the CVS, the
> mail lists and the web site to gather information on:
>
>  1) mail list activity (subscribers and messages)
>  2) web site activity (hits and downloads)
>  3) CVS activity (general commits, commits per person)
>
> This scanner provides this information in a simple format that can be
> easily fed into the documentation building system.
>
> Build system
> ============
>
> The build system will:
>
> 1) aggregate, filter and otherwise adapt the information collected from
> the various subprojects CVS modules, from the log scanner and from the
> GUMP run into static HTML files (for the browser pages), static PDF
> files (for print-friendly versions) and JPEG images (for graphs).
>
> 2) generate navigation information in all the pages
>
> 3) check validation of all the required XML files and send nag messages
> to the mail lists if failure occurs.
>
> 4) generate httpd-related corollary files (.htaccess, header.html,
> footer.html and so on).
>
> 5) upload the parts that didn't have failures online.
>
> The goal is to have the system running completely autonomous: this
> follows the Gump approach. [Sam, I'll need your help here, since I don't
> have an account on nagoya]
>
>                                         - o -
>
> Things to decide
> ================
>
> 1) DTDs
> -------
>
> The Cocoon project already has DTDs for 'documentation','change
> logs','todo list' and 'specifications'. They mainly use XHTML tags and
> are very easy to learn (they are an expansion of the original stylebook
> DTDs, so it's pretty easy to automatically adapt existing stylebook
> documents to this improved DTD, still keeping the simplicity we had
> before).
>
> The rest of the required DTDs (description, news and ToC) must be agreed
> upon (i'll work on them in the next days)
>
> 2) URIs
> -------
>
> In order to achieve the future-compatible goal, we must come up with a
> guideline for URIs.
>
> For example, the Cocoon project had /cocoon and /cocoon2, then Cocoon
> 2.0 was released final and we moved /cocoon2 into /cocoon and /cocoon
> into /cocoon1, creating a shit-load of broken links.
>
> Two solutions where proposed (add your own if you have more)
>
>  a) use version specific information and use mod_rewrite to adapt. for
> example
>
>  xml.apache.org/cocoon/1.8.2/index.html
>  xml.apache.org/cocoon/2.0b1/index.html
>  xml.apache.org/cocoon/2.0b2/index.html
>  xml.apache.org/cocoon/2.0rc1/index.html
>  xml.apache.org/cocoon/2.0rc2/index.html
>  xml.apache.org/cocoon/2.0/index.html
>
> then
>
>  xml.apache.org/cocoon/ -> xml.apache.org/cocoon/2.0/index.hml
>
> Problem is that while those versioned URI are never broken, the
> version-less redirected URI is changed for each release and doesn't
> reduce broken links. Also, it's probably easier to download the required
> version and look into the shipped docs and results in unnecessary big
> web sites.
>
>  b) use semantic-meaningful yet version-less URIs
>
>   xml.apache.org/cocoon/previous/ -> points to the previous generation
> docs
>   xml.apache.org/cocoon/ -> points to the latest docs
>   xml.apache.org/cocoon/next/ -> points to the next generation docs
>
> which removes the need to have keep all the docs versions online, yet
> provides the ability to have both versions the latest one and the
> previous generation (for Cocoon would be Cocoon 1.8.2, Cocoon 2.0,
> Cocoon 2.1-dev today).
>
> The problem of broken links isn't solved since everytime there is a
> transition, there is a chance of breaking previously established links
> if the docs ToC changes from one generation to the next.
>
> 3) layout
> ---------
>
> The layout previously proposed on this list was a solution to the speed
> problem but I couldn't adapt it to the depth needs identified in the
> rest of the goals.
>
> So, I resurrected my rusty web design skills and came up with the layout
> you find attached. I've tested it on IE 5.5, NS 4.78 and Moz 0.9.5 on
> win2k.
>
> Feedback, suggestions and criticisms are appreciated.
>
> 4) CVS location and mail list discussions
> -----------------------------------------
>
> Just like Gump which is not a subproject on its own, Forrest doesn't
> deserve that status neither as long as it remains a single-man show (and
> my experience tells me it will very likely remain so if the above goals
> are met)
>
> At the same time, just like Gump, it requires a CVS space.
>
> Possible places are:
>
>  1) xml-site
>  2) xml-forrest
>  3) xml-site2
>
> for mail list discussions, solutions are:
>
>  1) [EMAIL PROTECTED]
>  2) [EMAIL PROTECTED]
>  3) [EMAIL PROTECTED]
>  4) [EMAIL PROTECTED]
>
> Please, add your comments/suggestions and your votes where a decision is
> required.
>
> Thank you.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Forrest (a.k.a. xml.apache.org 2.0)

Reply via email to