On 1/6/06, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> Currently, Lenya is based on the following axioms
> 1. A URL is represented by exactly one document.
> 2. A document can be represented by an arbitrary number of URLs.
These are axioms of the Web. For a given URL at a given time, web
browsers can only receive one response. Web servers can send the same
response (such as error pages) to multiple URLs.
> 3. For each document, there is exactly one canonical URL.
> This is reflected in the following methods:
> DocumentBuilder.buildDocument(...)
> DocumentBuilder.buildCanonicalUrl(...)
I am not certain I understand. For each document, there must be one
primary URL. It is possible to have multiple URLs refer to the same
document. This is often handled by Apache httpd rewriting URLs so
both "/mydoc" and /pub/live/mydoc" refer to the same document. In all
web servers, including Lenya, many URLs refer to the error pages.
There are other methods for making several URLs refer to the same
document, but typical usage does not find them useful. The easiest
method to implement is to add a property to the document at one
location which causes another document to override it:
<doc id="requestedDoc" override="responseDoc">
so "/requestedDoc" returns "/responseDoc". This is useful for
deprecating documents when pages are combined or renamed.
> At the moment, the concept of multiple URLs per document is typically
> used for language versions (foo_{defaultlanguage}.html = foo.html)
> and to support different URL suffixes (foo, foo.htm, foo.html).
Language is a property required for determining the response. It must
be included in the request, or the default is used. Language does not
need to be in the URL; it could be handled by the session information
for the visitor.
But yes, Lenya can send the same response to different requests.
> The site structure is currently tightly connected to the URL space.
> Link URLs are derived directly from the site structure:
> <node id="foo">
> <node id="bar"/>
> </node>
> is interpreted as
> /foo/bar
The hierarchical index must exist, but does not need to be reflected
in the data structure. We have discussed moving to a flat structure
for Document storage. In relational databases, even the hierarchical
index would be stored in a flat structure:
Key - ParentKey
DocumentID - ParentDocumentID
getParent(Key)
getAllChildren(ParentKey)
An alternative would allow a document to have several parents:
Key - ParentKey - IsPrimary
DocumentID - ParentDocumentID - True|False
getPrimaryParent(Key)
getAllParents(Key)
getAllChildren(ParentKey)
getAllPrimaryChildren(ParentKey)
JCR allows more options for the structure, but we still have to decide
if we want the ability for a document to have several parents. This
functionality may be useful in rare cases. Is it worth the additional
complexity to make Lenya useful for those cases? It would be much
more difficult to add it later than to integrate it during the
migration to JCR.
> The language version is handled orthogonally to the site structure.
> The URL is determined by combining both document ID and language.
As I wrote above, language does not need to be in the URL. But it is
likely to remain there.
> If we want to allow multiple site structures, we have to choose between
> the following options:
> 1. The connection between site structures and URL space is kept. This implies
> - a document has a different canonical URL for each site structure
> - calculating a document's URL depends on the site structure
The flat structure offers too many improvements. URLs can be
flexible. Different modules can look up the Document using different
methods. The "live" module will probably accept:
/primaryParent/docID
/alternateParent/docID
/docUNID
> 2. The purpose of the site structure is reduced to building navigation
> widgets etc., the URL space is orthogonal to that.
> - a document has only one single canonical URL
> - the site structure stores the UUID of a document
> - navigating the site structure is not reflected 1:1 in the URL space
> Option (2) implies that, when a document is created, its URL and its location
> in the site structure have to be determined. IMO this is just a GUI issue.
> In most cases, a default site structure which corresponds to the URL space,
> will be used to create documents. These documents can be referenced from
> other site structures later on.
>
> I'm not particularly fond of the DocumentBuilder concept. With option (2)
> and the default site structure it would be obsolete, because the document
> could be derived directly from the default site structure.
Agreed.
> The ambiguity
> that multipe, arbitrary URLs can point to a document would be removed.
This is the part I do not understand. I do not see any ambiguity.
> ----
> The question is if multiple URLs for a document should be allowed at all.
> Actually I don't think this is necessary. At the moment, many publications
> show the following behaviour:
> /foo.html -> Hello World!
> /foo_en.html -> Hello World!
> /foo_de.html -> Hallo Welt!
>
> Why is the support for /foo_en.html necessary? I see only two reasons:
> 1. Laziness. You don't have to find out the default language to create a URL.
> 2. You can switch the default language without creating dead URLs.
> IMO both of them don't outweigh the disadvantages of an ambiguous URL space.
> In fact, (2) should probably be avoided because the content of a document
> page changes (it becomes a different language version). So IMO it could look
> like this:
> /foo.html -> Hello World!
> /foo_en.html -> 404
> /foo_de.html -> Hallo Welt!
Language must be a property of all documents. "/foo.html" is a
shortcut to the default.
Our 1.2 publication forces /foo.html to redirect the browser to
/foo_{currentLanguage}.html. We want the primary URL to be the only
URL displayed. If I was designing it now, I might move language
earlier in the URL using Apache httpd's rewriting. Or maybe not,
because Lenya1.2 is very focused on language being specified after the
docID, and the extra complexity does not add much value.
> Actually this would simplify the URL mapping concept by merging document ID
> (or better document path to avoid confusion with the UUID) and language.
> In the site structure, there wouldn't be multiple language versions of a
> document, but only links to documents. The connection between the actual
> language versions of a document would be represented in another location
> (see ContentNode and Document in o.a.l.cms.repo for more information).
>
> Assuming we have two documents which are language versions of the
> same content:
> * language="en" uuid="1-en"
> * language="de" uuid="1-de"
>
> This could be represented for instance by the following default site
> structures:
> 1. /foo.html
> /foo_de.html
> <node id="foo" document-uuid="1-en"/>
> <node id="foo_de" document-uuid="1-de"/>
> (note that the language suffix "_de" is just a part of the URL)
> 2. /en/foo.html
> /de/foo.html
> <node id="en">
> <node id="foo" document-uuid="1-en"/>
> </node>
> <node id="de">
> <node id="foo" document-uuid="1-de"/>
> </node>
> Assuming that a document can only be referenced once in the default site
> structure, it is now trivial to map URLs to documents and vice versa, without
> using a DocumentBuilder. The important fact is that the knowledge how to map
> URLs to documents belongs to the component which *creates* documents. That's
> why there's no knowledge duplication if you hard-code that the German
> version of /en/foo should be created at /de/foo.
There was a discussion about moving language earlier in the URL and
site structure:
/pub/area/doc_language.extension ->
/pub/area/language/doc.extension
My initial reaction was it did not matter to me. The URL does not
matter, because Lenya can retrieve the language from multiple places
(after the underline, anywhere in the URL, session information,
configured default), as long as the parameter is available.
It does matter to the site structure.
The old structure is:
docID/language (implemented by {docID}/index_{language}.html)
Moving language earlier (/language/docID) or combining it with the
docID (/docID-language) would lose some abilities.
Combining it with the docID makes most language functions very
complex. There are good reasons for maintaining the property
separately. (I agree the "/index" part of Lenya1.2's structure was
unnecessary, but that will disappear with JCR.)
Having the language earlier in the site structure makes
removeLanguage() extremely easy and efficient. But that function is
rarely used, so efficiency does not matter.
Keeping the old structure (or modifying it in JCR so "Language" is
under "Document" either as a Property of all Versions, or as a
subnode) allows createAllLanguageVersions() to be easy and efficient.
That function would be used often in any multiple language
publication. One of Lenya's marketing points is good support for
multiple languages, so that is important.
> ----
> Supporting the other case, multiple URL suffixes for a document, is certainly
> necessary. But I'd separate this information from the document itself.
> IMO the URL suffix should be used to request a certain view of a document:
> /foo -> HTML view
> /foo.html -> HTML view
> /foo.pdf -> PDF view
> /foo.print.html -> print HTML view (if CSS is not appropriate or whatever)
>
> The canonical URL of a document whould be assembled from the canonical
> base URL (/foo) and the extension denoting the view. This would be done
> by the client code, the document itself (or whatever component knows the
> document's URL) would just return the canonical base URL. (BTW, the term
> canonical is not necessary anymore since only one base URL exists per
> document)
Agreed. The Document Node contains XML. The extension should be used
by docType modules to determine what processing is required.
> ----
> Another question: With multiple site structures, how does the system keep
> track of the currently selected site structure?
> - URL prefix
> - request parameter
> - arbitrary sitemap-based implementation
> (e.g. using a matcher, the session etc.)
> - ...
I think you mean using multiple Views. The documents will use a flat
storage. The "normal" hierarchical View is for documents to appear
under their parent. Additional Views could be specified (although the
only one I can imagine quickly would be a "Only Primary Children" View
if we allow documents to be children of multiple parents.) The View
choice would be specified and used by the module, especially if we use
the "Area" part of the URL to specify the module. Examples:
/pub/live/docID uses the "live" module which displays a Document while
using the hierarchical View for menus.
/pub/map[/docID] uses the "map" module which uses the hierarchical
View to display the entire structure with the optional document
highlighted.
/pub/titles[/docID] uses the "title" module which uses the flat View
to display the entire structure sorted alphabetically by Title with
the optional document highlighted.
(It might be better to have an "index" module which takes the sort as
a parameter: /pub/index/titles/docID (the optional docID is used for
highlighting "You are Here")
/pub/index/created
/pub/index/published (last published at bottom)
/pub/index/published-reverse (last published at top)
solprovider
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]