Sylvain Wallez wrote:

Ok, captchas + human moderation is clearly too high of a barrier for spammers and even for defacers. Even infra@ would not have a problem with that.

There's an interesting chapter on circumventing captchas at wikipedia [1]. Are we "interesting enough" in terms of google ranking to attract such things?

Yeah! Apache is probably in top 1000 web sites in google. I think we are definately a target!


Captchas are not impossible to break, of cource, but they are *hard enough* and provide a first filtering barrier.

A second filtering barrier could be heuristical analysis of the comment.

But distributed human moderation is clearly a barrier that no spammer will be able to pass, unless the load becomes dramatic and at that point, blacklisting is the way to go.

Two question to Stefano (and everybody else): I proposed numbers as document IDs? What do you think about this?

I used to be a fanatic of 'readable URL'... but I think they present more problems than they solve.


First of all, the encoding is a pain. It's fine for english, but until we ave IRI (internationalized resource identifiers, think "unicode meets URI") support forget chinese, japanese, cyrillic, hebrew, korean and so on.

One normaly solution is to have an english title even for non-english pages. I dislike that, it's very anglo-centric.

Well, consider the state of Cocoon, the ASF, the opensource world and the whole IT industry: they're all anglo-centric. Would you have the same concerns if this was esperanto or interlingua rather than english (or more precisely "international english")?


Furthermore, translations must follow the original reference docs, which is the english one. So having all language-specific resources use the same name as their english counterpart isn't a problem to me.

fair enough and coming from a french is rather something ;-)

Second, people like Nielsen argue that readable URLs are easier to use and to remember. I think it's bullshit. Not even my bookmarks satisfy me anymore in terms of link management (del.icio.us + google killed my browser bookmarks), do you really think I would type in or remember any URL today? nonsense.

I do remember a lot of URLs, provided of course that they are meaningful. And I have a very powerful tool to help me crawling in this tree of URLs that I know: the Firefox address bar autocompletion (which BTW just a reuse of the unix command-line behaviour).


And the more you use a URL, the more it engraves into your mind. Nothing new in the cognition area here, but that means that a lot of regular users of Cocoon know the URLs space of its documentation by heart, or at least the main directory names.

here we are different. I use google even to get to the cocoon web page because typing "cocoon" and return is easier than http://cocoon.apache.org/ cocoon ... actually, I do [ctrl-space] co now that I have my bookmarks (both local and delicious) managed by Quicksilver.


Readability of URL is just not much for me.

There are a few values of a readable URL. The first is actionable breadcrumbs.



Breadcrumbs should better be generated from the navigational structure rather than the page path, even if both often match.

I agree here completely.

So, if you find yourself in

  http://site.com/a/b/c/d/e/f/g

you can automatically infer something like

  site.com > a > b > c > d > e > f > g

and, for shitty web sites, that is a *tremendous* navigation help. For URLs like

  http://site.com/page/39884984

that's it, there is no hierarchical context that you can infer from it.

Now, we will not have a shitty web site, so this argument doesn't apply and Amazon (which is the most used e-commerce site in the world *and* has the worst URL space ever imagined!) shows that URL-space design does not impact usability, if the pages don't require so.

Yeah, but Amazon is a large catalogue of things, not a documentation covering lots of different subjects from introduction to details.

True enough.

But hypermedia allows a page to reside in more than one "trail of reading", while a hierarchical navigation imposes a TOC-like view, which might satisfy (and feel natural to one user) but look ugly and totally unfamiliar to others.

I think it's the "cataloguing" part that makes writing documentation so hard and that's why things like wikipedia are taking off so much instead.

I personally think that the problem with documentation is that there are two concerns:

 - writers
 - assemblers

blogs, email, wikis, all share a common paradigm: you don't need to 'assemble' your thoughts, you just dump them. Other people do the assembly.

If you wish, this is the beauty of microcontent: massive parallelization (and the reason why the web bloomed, because it removed the "editing/cataloging" bottleneck.

but the problem was that searching for stuff used to be a nightmare (see early days of altavista). This "mare magnum" of content with no apparent structure made people "get lost" very easily.

This is the same feeling you have in a wiki. You have a trail of the pages that you have visited, but that's useless (you have it in your browser too!), you want to be able to "browse" the content, go from this content to something that is relevant to you.

In a book, this "relevance" was done by the author (or the editors) and was placed in sequential order. Or, if not, clustered in chapters or sections.

What a wiki misses (even the good ones like Confluence) is such "clustering" notion... something that is easy to achieve with more structured system, like forrest by mean of tabs or trees of links.

The problem with this approach is that there is only one way of clustering: repurposing pages becomes hell (and that's why there are so many broken links.. because the clustering evolves not only with the content of the page, but with the surroundings).

By separating the contept of writers and assemblers, not only you unleash a tremendous effort in content production (as our wiki showed)... but you allow this content to be "clusterized" and, hear hear, *in parallel*!

"Conditio sine qua non" of the above is a flat URL space.

Numeric? no, not necessarely, but flat for sure.

Actually, since geeks are used to hack into URLs but normal people do not, having a flat or bad URL space forces usability people to think about navigation in the page and not outside.

How much I dislike such sites that require me to go from the main page to go down to a particular page that I've already seen...

Sure, but that's a usability problem of the site, not of the URL space "per se".


Another argument, and probably more important, is that a flat URL structure gives a sense of 'wikiness' that people have come to dislike.

Now, again, this is a false impression (inspired by a plethora of bad practices rather than effectual technological limitations) but a strong one nevertheless (I do feel the same about it at times).

But *exactly* because of that, I think we should be brave and show the world that a flat URL space *does not* automatically yield 'wiki-like' flat spaces that are extremely painful to navigate.

Flat numeric URL spaces have also extremely interesting advantages:

- pages can have their titles adjusted without impacting persistance (links are more solid over time)

Adjusting a title doesn't mean you change its content, in which case there's no need to change its name. And if it's content changes, then it's a different page with a different name.

Fair enough.

- pages can be rearranged/repurposed/re-aggregated/re-used without impacting persistance

Agree for "rearranged" as a flat space allows to change the navigation tree without impacting path names. Now repurposing a page requires to change its name (or id) and re-aggregating means removing (aggregation) or adding (split) some pages.

Yep.

Another question is the structure of URLs - the new efforts of Sylvain who wants to provide some docs in French needs some thinking where to put them.



Wait, wait! I haven't proposed to translate the docs!! This is a tremendous and effort! I proposed to just translate the introductory page to accompany the french-speaking mailing-list.

eheh, sure, but Reinhard did a good thing in bringing this up.

I propose

http://c.a.o/ ............... editable global docs (own repository)
http://c.a.o/fr/ ............. editable global docs in French (own repository)
http://c.a.o/2.2/ ............ editable docs of 2.2 (own repository)
http://c.a.o/2.2/fr/ ......... editable docs of 2.2 in French (own repository)
http://c.a.o/2.2.1/ .......... "frozen" docs of the 2.2.1 release
http://c.a.o/2.2.1/en/ ....... "frozen" French docs of the 2.2.1 release



I don't think we should have frozen docs at any time, they are included in the distributions anyway and those distributions will be persisted for the longest time.


Sun did this with the Java API did this and created a mess, people linked to java/1.4.2/ and then 1.4.3 was created and all links broke down.

If a document shipped in 2.1.3 has a bug and was fixed in 2.1.4, why would anybody want to see it? and if 2.1.4 removed something useful for 2.1.3, that's a bug and we should fix it in the doc, rather than make everything available on the web.

So I'm -1 on this.

Agree. We may want to keep around the docs for each major release (i.e. 2.0, 2.1, 2.2) as Tomcat does, but certainly not the docs for minor releases (i.e. 2.2 and 2.2.1).

Cool.

As for french docs, I *strongly* think that we should do this thru content-negotiation rather than URL design. A person accessing the page with a french browser will get the page in french, that's all they have to know (and the page will have a series of flags that will trigger an overload in locale, but that's going to be a parameter of the URL, not part of it).

The language a page is written, just like the data-type of the page, should not belong in the URL.

This makes the URL space way more "solid" overtime: I can link to

 http://cocoon.apache.org/2.2/3984948

and *be sure* that it will be there a few years from now and, by then, maybe a translation in my native language would have poped up!

And why shouldn't e.g. http://cocoon.apache.org/2.1/userdocs/flow/continuations.html not be there?

example: because somebody decided to split the user section by "concern area" and so continuation now belongs to


 http://cocoon.apache.org/2.1/users/programmer/flow/continuations.html

but not everybody thought that this was a good idea, so we have a redirect from the old URI to the new one... but down the road, somebody from the Lisp world come along and shows how the term "continuation" is actually misleading and he convinces to change this to "webcontinuations" so that we now have a redirect from the old URL to the newer one to the newest one.

But it's true that persistenace of a URL is a property of those administering it not of the URL itself.

let's be brave!

Let's be brave and dive into a fog of meaningless URLs? I'm not convinced...

I showed why I want a flat URL space.

Now, I could be convinced to change from

 http://cocoon.apache.org/2.2/3940834

to

 http://cocoon.apache.org/2.2/continuations

but only if we mandate that titles cannot contain '/'. This will force people to test for URL naming collisions and will carve a anglophone-centric view of our system (for now and forever!), but I can live with that.

thoughts?

--
Stefano.



Reply via email to