On 1/29/06, Bob Harner <[EMAIL PROTECTED]> wrote:
> As briefly discussed on the user list recently (subject: "Losing
> hyperlinks - what xsl removes them?"), the LinkRewritingTransformer
> seems to need some improvements so that it can rewrite all types of
> links.  It currently only rewrites <a href="foo"> where foo is a
> document-relative URI.  I'm sure I'm NOT the best person to do so
> (being much less familiar with 1.4 than 1.2.x), but I've been looking
> over the code and humbly offer the following initial thoughts.  Your
> advise and guidance is eagerly sought...

I am not certain how and why the LinkRewritingTransformer works, but
it fixed something when I was setting up the httpd proxying.  I was
new to Lenya and just accepted the magic.

> 1) <editorial>We have really overloaded the word "resource" in Lenya &
> Cocoon, haven't we?  Sometimes it means "an asset or a CMS document"
> (per http://wiki.apache.org/lenya/ProposalArchitecture), or sometimes
> it specifically just an asset (per Resource.java).  The word is also
> used in sitemap files to refer to a reusable part of a pipeline.
> Elsewhere it refers vaguely to a "miscellaneous relate file" (the
> lenya/resources dir).  Sometimes it means the amount of memory, hard
> drive space, and CPU cycles available.  And Document Types are now
> officially Resource Types. This overloading of terminology makes it
> harder to learn Lenya. I think "Content", "Content Item", and "Content
> Type" are probably much better terms for a CMS to use.  Precise and
> unambiguous terminology always a good thing.</editorial>

Cocoon seems to put no thought into their naming schemes (or they were
being deliberately malicious and confusing.)  They refer to the main
processing files (which I call XMAPs) as "Sitemaps", even though that
term had a well defined and accepted meaning before Cocoon was started
(1998).  Given their usage, they could be named "Routes" (keeping the
"map" namespace.)

"Document" is a well-defined term, especially with XML.

"Document Type" is fine.  XML uses "Document Type Definition", which
implies that "Documents" that share a "Definition" are the same
"Type".  This should not be the same as a "Resource Type"; all XML
Documents would have the Resource Type of "Document".  (Text documents
are "Assets" and have "Asset Types" of "Text", "HTML", "CSS", et
cetera.)

The new "repo" code is using "Content" and variants.  To me, "Content"
refers to the main part of the "Document".  There may be additional
information (referred to as "META") that is not part of the "Content".
 Even the Document Identifiers (IDs, UNIDs, URLs) are part of the META
Information, not part of the "Content".  Lenya 1.2 defined Content as
the collection of all Documents, but "Documents" or "Resources" is
better.
The old (1.2) version was:
/Publication/Content/Area/Document/Language
I expected the new version to be:
/Publication/Resources/Document/Language/Version/Content

"Asset" is a great term for any data that is not an (XML) Document.

"Asset Type" should refer to the Type of Asset: Graphic, HTML, CSS,
PDF, et cetera.

"Resource" could be the superset of Documents and Assets.  I agree
that in computers, "Resources" usually refers to hardware
capabilities, but it is acceptable to overload it with a
software-specific meaning.  The only alternate word I could find in a
thesaurus that was not already used was "Stuff", which may not be
acceptable in an enterprise application.  Accepting this definition
for "Resource", then "Resource Type" defines whether a Resource is a
Document or an Asset.

With "map:resource", Cocoon again reused a well-defined term for
something other than its accepted definition.  When using GOSUB, the
called code is referred to as a "Subroutine".  This has been the
preferred word for decades; why didn't Cocoon use it?

> 2) As Andreas said a couple weeks ago, "It's about time to handle
> documents and assets in the same way".  I think there is a need for a
> comon interface shared by both CMS documents and assets, so both can
> be handled uniformly -- particulary for link rewriting, where the
> URI's of both CMS documents and assets need to be rewritten in the
> same way.  This would be, perhaps, "ContentItem".  And both Document
> and Resource (which maybe should be named Asset?) should implement
> this interface and DefaultDocument and Resource should extend a
> DefaultContentItem class.  Or is there a better idea?

(Using the above definitions)
/Publication/Resources/Document/Language/Version/Content
/Publication/Resources/Asset/Version

> 3) I think maybe the link rewriting should be done when a CMS document
> is published, deactivated, or exported, rather than every time it is
> displayed.  This change would be a performance boost for every page.
> Or am I missing something in why it needs to be done at display time?

Link rewriting should use an index to translate the UNID of the
Resource to a human-readable URL.  This information could be stored in
the Resource, but would be better stored in an Index.

> 4) LinkRewritingTransformer relies heavily on the
> DefaultDocumentBuilder class, whose isDocument() method simplistically
> returns true for any URL's starting like "/lenya/mypub/authoring/"
> even if the URL points to an asset, not a CMS document.  In contrast,
> note that the sitemaps verify that the URL ends in ".html" before
> assuming that a URL is really a CMS document.  Should
> DefaultDocumentBuilder's isDocument() method be changed to look for
> the ".html" ending?  (But do CMS documents *always* have an ".html"
> ending?)

That sounds like the some of the core classes need revising. 
isDocument() should be in the Resource class, as the parent of the
Document and Asset classes.  Document.isDocument() returns true. 
Asset.isDocument() returns false.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to