[EMAIL PROTECTED] wrote:
On 1/6/06, Andreas Hartmann <[EMAIL PROTECTED]> wrote:

Currently, Lenya is based on the following axioms
1. A URL is represented by exactly one document.
2. A document can be represented by an arbitrary number of URLs.


These are axioms of the Web.  For a given URL at a given time, web
browsers can only receive one response.  Web servers can send the same
response (such as error pages) to multiple URLs.

Sorry, my terms were too generic in their meaning. With "document", I
mean o.a.l.cms.publication.Document object. This makes my statement
more specific.


3. For each document, there is exactly one canonical URL.
This is reflected in the following methods:
  DocumentBuilder.buildDocument(...)
  DocumentBuilder.buildCanonicalUrl(...)


I am not certain I understand.  For each document, there must be one
primary URL.  It is possible to have multiple URLs refer to the same
document.  This is often handled by Apache httpd rewriting URLs so
both "/mydoc" and /pub/live/mydoc" refer to the same document. In all
web servers, including Lenya, many URLs refer to the error pages. There are other methods for making several URLs refer to the same
document, but typical usage does not find them useful.  The easiest
method to implement is to add a property to the document at one
location which causes another document to override it:
<doc id="requestedDoc" override="responseDoc">
so "/requestedDoc" returns "/responseDoc".  This is useful for
deprecating documents when pages are combined or renamed.

Yes, but my comment was rather related to Document objects. What content
(web page) you actually return is a different story.


At the moment, the concept of multiple URLs per document is typically
used for language versions (foo_{defaultlanguage}.html = foo.html)
and to support different URL suffixes (foo, foo.htm, foo.html).


Language is a property required for determining the response.  It must
be included in the request, or the default is used.  Language does not
need to be in the URL; it could be handled by the session information
for the visitor.

At the moment, we only support languages as part of the URL (see
DocumentBuilder method signatures). Actually, I wouldn't like to
allow other means of language specification for the following reasons:

- Providing the language in session/cookie/etc. information can easily
  implemented using a redirect to the corresponding URL (including the
  language).

- IMO the content represented by a URL (I mean the raw information
  content, excluding personalization etc.) shouldn't depend on session
  information etc.

- The API is probably easier when you can build document objects
  based on string information rather than complex request objects.
  (sure, the API shouldn't limit the flexibility, but see above -
  IMO redirects are sufficient)


But yes, Lenya can send the same response to different requests.


The site structure is currently tightly connected to the URL space.
Link URLs are derived directly from the site structure:
  <node id="foo">
    <node id="bar"/>
  </node>
is interpreted as
  /foo/bar


The hierarchical index must exist, but does not need to be reflected
in the data structure.  We have discussed moving to a flat structure
for Document storage.

Yes, this is already implemented as a draft (see o.a.l.cms.repo package).

[...]

JCR allows more options for the structure, but we still have to decide
if we want the ability for a document to have several parents.  This
functionality may be useful in rare cases.  Is it worth the additional
complexity to make Lenya useful for those cases?  It would be much
more difficult to add it later than to integrate it during the
migration to JCR.

Maybe you'd like to start a thread about this? I agree that it is
worth discussing.

[...]

The ambiguity
that multipe, arbitrary URLs can point to a document would be removed.


This is the part I do not understand.  I do not see any ambiguity.

It might occur that multiple URLs represent a single Document object.
With URLs u1 and u2, it is not possible to check if u1 and u2 represent
the same document without using the DocumentBuilder.


The question is if multiple URLs for a document should be allowed at all.
Actually I don't think this is necessary. At the moment, many publications
show the following behaviour:
/foo.html       -> Hello World!
/foo_en.html    -> Hello World!
/foo_de.html    -> Hallo Welt!

Why is the support for /foo_en.html necessary? I see only two reasons:
1. Laziness. You don't have to find out the default language to create a URL.
2. You can switch the default language without creating dead URLs.
IMO both of them don't outweigh the disadvantages of an ambiguous URL space.
In fact, (2) should probably be avoided because the content of a document
page changes (it becomes a different language version). So IMO it could look
like this:
/foo.html       -> Hello World!
/foo_en.html    -> 404
/foo_de.html    -> Hallo Welt!


Language must be a property of all documents.  "/foo.html" is a
shortcut to the default.

Our 1.2 publication forces /foo.html to redirect the browser to
/foo_{currentLanguage}.html.

This is IMO a good idea. The redirect removes the necessity to represent
/foo.html and /foo_en.html by the same document.

[...]

Assuming we have two documents which are language versions of the
same content:
* language="en" uuid="1-en"
* language="de" uuid="1-de"

This could be represented for instance by the following default site structures:
1. /foo.html
   /foo_de.html
   <node id="foo" document-uuid="1-en"/>
   <node id="foo_de" document-uuid="1-de"/>
   (note that the language suffix "_de" is just a part of the URL)
2. /en/foo.html
   /de/foo.html
   <node id="en">
     <node id="foo" document-uuid="1-en"/>
   </node>
   <node id="de">
     <node id="foo" document-uuid="1-de"/>
   </node>
Assuming that a document can only be referenced once in the default site
structure, it is now trivial to map URLs to documents and vice versa, without
using a DocumentBuilder. The important fact is that the knowledge how to map
URLs to documents belongs to the component which *creates* documents. That's
why there's no knowledge duplication if you hard-code that the German
version of /en/foo should be created at /de/foo.


There was a discussion about moving language earlier in the URL and
site structure:
/pub/area/doc_language.extension ->
/pub/area/language/doc.extension

My initial reaction was it did not matter to me.  The URL does not
matter, because Lenya can retrieve the language from multiple places
(after the underline, anywhere in the URL, session information,
configured default), as long as the parameter is available.

It does matter to the site structure.
The old structure is:
docID/language (implemented by {docID}/index_{language}.html)

That's just an example.


Moving language earlier (/language/docID) or combining it with the
docID (/docID-language) would lose some abilities.

Both options must be supported.

Combining it with the docID makes most language functions very
complex.  There are good reasons for maintaining the property
separately.

This is a very good point. Document ID and language should not be
related.


(I agree the "/index" part of Lenya1.2's structure was
unnecessary, but that will disappear with JCR.)

Having the language earlier in the site structure makes
removeLanguage() extremely easy and efficient.  But that function is
rarely used, so efficiency does not matter.

Keeping the old structure (or modifying it in JCR so "Language" is
under "Document" either as a Property of all Versions, or as a
subnode) allows createAllLanguageVersions() to be easy and efficient. That function would be used often in any multiple language
publication.  One of Lenya's marketing points is good support for
multiple languages, so that is important.

Yes, I also thought about that. But IMO it is sufficient to provide
some basic document generation mechanisms (e.g., for /foo_de.html and
/de/foo.html URLs), and to support custom implementations.

[...]


----
Another question: With multiple site structures, how does the system keep
track of the currently selected site structure?
  - URL prefix
  - request parameter
  - arbitrary sitemap-based implementation
    (e.g. using a matcher, the session etc.)
  - ...


I think you mean using multiple Views.

Yes, multiple navigational views of the document space.


The documents will use a flat storage.

Yes.

> The "normal" hierarchical View is for documents to appear
under their parent.

Just a note - would that be the parent in the default (URL related)
site structure, i.e. /foo would be the parent of /foo/bar?

Additional Views could be specified (although the
only one I can imagine quickly would be a "Only Primary Children" View
if we allow documents to be children of multiple parents.)  The View
choice would be specified and used by the module, especially if we use
the "Area" part of the URL to specify the module.  Examples:
/pub/live/docID uses the "live" module which displays a Document while
using the hierarchical View for menus.

/pub/map[/docID] uses the "map" module which uses the hierarchical
View to display the entire structure with the optional document
highlighted.

/pub/titles[/docID] uses the "title" module which uses the flat View
to display the entire structure sorted alphabetically by Title with
the optional document highlighted.
(It might be better to have an "index" module which takes the sort as
a parameter: /pub/index/titles/docID  (the optional docID is used for
highlighting "You are Here")
/pub/index/created
/pub/index/published (last published at bottom)
/pub/index/published-reverse (last published at top)

That's actually not exactly what I had in mind, but it is very interesting
as well. I was only thinking of navigation widgets that operate differently
on the same URL space and therefore would have to be tracked using
the session etc.

The examples you're refering to would imply reserved URL spaces.
Actually this concept is not yet supported by the Lenya internals
(sure, you can implement it using Cocoon internals), and is IMO
too complex to be discussed in this thread (though it obviously is
related).

-- Andreas




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to