On 1/29/06, Bob Harner <[EMAIL PROTECTED]> wrote: > As briefly discussed on the user list recently (subject: "Losing > hyperlinks - what xsl removes them?"), the LinkRewritingTransformer > seems to need some improvements so that it can rewrite all types of > links. It currently only rewrites <a href="foo"> where foo is a > document-relative URI. I'm sure I'm NOT the best person to do so > (being much less familiar with 1.4 than 1.2.x), but I've been looking > over the code and humbly offer the following initial thoughts. Your > advise and guidance is eagerly sought...
I am not certain how and why the LinkRewritingTransformer works, but it fixed something when I was setting up the httpd proxying. I was new to Lenya and just accepted the magic. > 1) <editorial>We have really overloaded the word "resource" in Lenya & > Cocoon, haven't we? Sometimes it means "an asset or a CMS document" > (per http://wiki.apache.org/lenya/ProposalArchitecture), or sometimes > it specifically just an asset (per Resource.java). The word is also > used in sitemap files to refer to a reusable part of a pipeline. > Elsewhere it refers vaguely to a "miscellaneous relate file" (the > lenya/resources dir). Sometimes it means the amount of memory, hard > drive space, and CPU cycles available. And Document Types are now > officially Resource Types. This overloading of terminology makes it > harder to learn Lenya. I think "Content", "Content Item", and "Content > Type" are probably much better terms for a CMS to use. Precise and > unambiguous terminology always a good thing.</editorial> Cocoon seems to put no thought into their naming schemes (or they were being deliberately malicious and confusing.) They refer to the main processing files (which I call XMAPs) as "Sitemaps", even though that term had a well defined and accepted meaning before Cocoon was started (1998). Given their usage, they could be named "Routes" (keeping the "map" namespace.) "Document" is a well-defined term, especially with XML. "Document Type" is fine. XML uses "Document Type Definition", which implies that "Documents" that share a "Definition" are the same "Type". This should not be the same as a "Resource Type"; all XML Documents would have the Resource Type of "Document". (Text documents are "Assets" and have "Asset Types" of "Text", "HTML", "CSS", et cetera.) The new "repo" code is using "Content" and variants. To me, "Content" refers to the main part of the "Document". There may be additional information (referred to as "META") that is not part of the "Content". Even the Document Identifiers (IDs, UNIDs, URLs) are part of the META Information, not part of the "Content". Lenya 1.2 defined Content as the collection of all Documents, but "Documents" or "Resources" is better. The old (1.2) version was: /Publication/Content/Area/Document/Language I expected the new version to be: /Publication/Resources/Document/Language/Version/Content "Asset" is a great term for any data that is not an (XML) Document. "Asset Type" should refer to the Type of Asset: Graphic, HTML, CSS, PDF, et cetera. "Resource" could be the superset of Documents and Assets. I agree that in computers, "Resources" usually refers to hardware capabilities, but it is acceptable to overload it with a software-specific meaning. The only alternate word I could find in a thesaurus that was not already used was "Stuff", which may not be acceptable in an enterprise application. Accepting this definition for "Resource", then "Resource Type" defines whether a Resource is a Document or an Asset. With "map:resource", Cocoon again reused a well-defined term for something other than its accepted definition. When using GOSUB, the called code is referred to as a "Subroutine". This has been the preferred word for decades; why didn't Cocoon use it? > 2) As Andreas said a couple weeks ago, "It's about time to handle > documents and assets in the same way". I think there is a need for a > comon interface shared by both CMS documents and assets, so both can > be handled uniformly -- particulary for link rewriting, where the > URI's of both CMS documents and assets need to be rewritten in the > same way. This would be, perhaps, "ContentItem". And both Document > and Resource (which maybe should be named Asset?) should implement > this interface and DefaultDocument and Resource should extend a > DefaultContentItem class. Or is there a better idea? (Using the above definitions) /Publication/Resources/Document/Language/Version/Content /Publication/Resources/Asset/Version > 3) I think maybe the link rewriting should be done when a CMS document > is published, deactivated, or exported, rather than every time it is > displayed. This change would be a performance boost for every page. > Or am I missing something in why it needs to be done at display time? Link rewriting should use an index to translate the UNID of the Resource to a human-readable URL. This information could be stored in the Resource, but would be better stored in an Index. > 4) LinkRewritingTransformer relies heavily on the > DefaultDocumentBuilder class, whose isDocument() method simplistically > returns true for any URL's starting like "/lenya/mypub/authoring/" > even if the URL points to an asset, not a CMS document. In contrast, > note that the sitemaps verify that the URL ends in ".html" before > assuming that a URL is really a CMS document. Should > DefaultDocumentBuilder's isDocument() method be changed to look for > the ".html" ending? (But do CMS documents *always* have an ".html" > ending?) That sounds like the some of the core classes need revising. isDocument() should be in the Resource class, as the parent of the Document and Asset classes. Document.isDocument() returns true. Asset.isDocument() returns false. solprovider --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
