Andreas Hartmann wrote:

Hi Lenya devs,

I'd like to raise an issue that bothers me for quite a long time
and share some random thoughts.

Currently, Lenya is based on the following axioms
(please correct my if I'm wrong):

1. A URL is represented by exactly one document.


what you do mean by one document? In the case of the default publication this might be true,
but one can do it very differently


2. A document can be represented by an arbitrary number of URLs.


you mean like "softlinks"?


3. For each document, there is exactly one canonical URL.


what do you mean by canonical URL?



This is reflected in the following methods:

  DocumentBuilder.buildDocument(...)
  DocumentBuilder.buildCanonicalUrl(...)


if you use the DocumentBuilder, then I guess the above is correct, but I don't think
one has to use the DocumentBuilder



At the moment, the concept of multiple URLs per document is typically
used for language versions (foo_{defaultlanguage}.html = foo.html)
and to support different URL suffixes (foo, foo.htm, foo.html).

The site structure is currently tightly connected to the URL space.
Link URLs are derived directly from the site structure:

  <node id="foo">
    <node id="bar"/>
  </node>

is interpreted as

  /foo/bar

The language version is handled orthogonally to the site structure.
The URL is determined by combining both document ID and language.


If we want to allow multiple site structures, we have to choose between
the following options:

1. The connection between site structures and URL space is kept. This implies

   - a document has a different canonical URL for each site structure
   - calculating a document's URL depends on the site structure

2. The purpose of the site structure is reduced to building navigation
   widgets etc., the URL space is orthogonal to that.

   - a document has only one single canonical URL
   - the site structure stores the UUID of a document
   - navigating the site structure is not reflected 1:1 in the URL space


I am not sure if I understand you correctly, but I would say we should go with (2), but
I guess if you make an example, e.g.

/en/developers/andreas-hartmann

/de/entwickler/andreas-hartmann

/en/committers/andi

/de/committers/andreas



Option (2) implies that, when a document is created, its URL and its location in the site structure have to be determined. IMO this is just a GUI issue. In most cases, a default site structure which corresponds to the URL space,
will be used to create documents. These documents can be referenced from
other site structures later on.

I'm not particularly fond of the DocumentBuilder concept. With option (2)
and the default site structure it would be obsolete, because the document
could be derived directly from the default site structure. The ambiguity
that multipe, arbitrary URLs can point to a document would be removed.

----

The question is if multiple URLs for a document should be allowed at all.


sure, why not? I think there are many usecases for that and existing URL spaces
which couldn't be handled by Lenya if it won't support this...

Actually I don't think this is necessary. At the moment, many publications
show the following behaviour:

/foo.html       -> Hello World!
/foo_en.html    -> Hello World!
/foo_de.html    -> Hallo Welt!

Why is the support for /foo_en.html necessary? I see only two reasons:

1. Laziness. You don't have to find out the default language to create a URL.
2. You can switch the default language without creating dead URLs.

IMO both of them don't outweigh the disadvantages of an ambiguous URL space.
In fact, (2) should probably be avoided because the content of a document
page changes (it becomes a different language version). So IMO it could look
like this:

/foo.html       -> Hello World!
/foo_en.html    -> 404
/foo_de.html    -> Hallo Welt!



what if you switch the default language to german, then suddenly all foo_de become 404?!


Actually this would simplify the URL mapping concept by merging document ID
(or better document path to avoid confusion with the UUID) and language.
In the site structure, there wouldn't be multiple language versions of a document, but only links to documents. The connection between the actual
language versions of a document would be represented in another location
(see ContentNode and Document in o.a.l.cms.repo for more information).

Assuming we have two documents which are language versions of the
same content:

* language="en" uuid="1-en"
* language="de" uuid="1-de"

This could be represented for instance by the following default site structures:

1. /foo.html
   /foo_de.html

   <node id="foo" document-uuid="1-en"/>
   <node id="foo_de" document-uuid="1-de"/>

   (note that the language suffix "_de" is just a part of the URL)


I am not sure if this is a good idea and what the consequences are ... my belly tells me that it's a bad idea ;-)
(e.g. in the case of switching the default language)


2. /en/foo.html
   /de/foo.html

   <node id="en">
     <node id="foo" document-uuid="1-en"/>
   </node>
   <node id="de">
     <node id="foo" document-uuid="1-de"/>
   </node>

Assuming that a document can only be referenced once in the default site
structure, it is now trivial to map URLs to documents and vice versa, without using a DocumentBuilder. The important fact is that the knowledge how to map URLs to documents belongs to the component which *creates* documents. That's
why there's no knowledge duplication if you hard-code that the German
version of /en/foo should be created at /de/foo.

----

Supporting the other case, multiple URL suffixes for a document, is certainly
necessary. But I'd separate this information from the document itself.
IMO the URL suffix should be used to request a certain view of a document:

/foo              -> HTML view
/foo.html         -> HTML view
/foo.pdf          -> PDF view
/foo.print.html -> print HTML view (if CSS is not appropriate or whatever)


this might be one scheme, but others are possible as well. I think Lenya needs to allow flexibility here,
because otherwise you shut Lenya out from many URL spaces being used


The canonical URL of a document whould be assembled from the canonical
base URL (/foo) and the extension denoting the view. This would be done
by the client code, the document itself (or whatever component knows the
document's URL) would just return the canonical base URL. (BTW, the term
canonical is not necessary anymore since only one base URL exists per document)

----

Another question: With multiple site structures, how does the system keep
track of the currently selected site structure?

  - URL prefix


that would be my first suggestion, similar to "context"  for servlets

  - request parameter
  - arbitrary sitemap-based implementation
    (e.g. using a matcher, the session etc.)
  - ...

----

Feel free to add your comments. I'll continue thinking about these issues.


I think it's best if we use a few real world examples, because then it becomes much clearer very quickly.

Michi


-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to