Ciaran McNulty wrote:
On Sun, Jun 29, 2008 at 3:07 PM, Duncan Cragg <[EMAIL PROTECTED]> wrote:
Those of us who favour opaque URLs (actually for practical reasons such as
clean separation of concerns, maintainability, etc.) are unhappy with being
forced into a semantic URL schema when using rel-tag.

Can you go into a bit more detail, or point to a resource explaining
the benefits of opaque URLs?  It's something I've not come across
before and I'd be intrigued to see the reasons behind it.

I'll do both. Here's a resource explaining it - I addressed the subject in this blog post:

http://duncan-cragg.org/blog/post/content-types-and-uris-rest-dialogues/

That is a very transparent URL (see: I'm not obsessive about it!).
The trouble with my URL is that it mixes three concerns:

1. making a connection to my server and kicking off HTTP
2. identifying a resource (with a completely opaque string) within HTTP
3. kicking off some Python code with an argument string

It's 1. and 3. I'm talking about. URLs are already opaque to HTTP.

As soon as you allow in syntax or schema in URLs - as soon as you start using anything other than long random numbers - you've got a problem of namespace allocation and schema standardisation. I refer to "Zooko's Triangle" on my blog's right rail which discusses the trade-off between global uniqueness, security and memorability.
_________________________________________

On 1.: Unless you're running fancy P2P algorithms, it's hard to argue against putting a big hint in the URL to say where to go to find the resource. But don't forget that you needn't go to that server - you could ask an intermediary proxy - which is kind of a simplistic P2P algorithm... However, there is a case for arguing that DNS has been a failure: it isn't any more easy to type a URL when you know you have to be so precise to avoid scam sites. And it isn't any easier to use it to identify a site when you have to avoid the likes of www.yahoo.com.baddies.com or www.google.randomtld . You may as well only use IP addresses; as hard to type and as useless to read. Most programs come with a copy-paste function to save some typing...

Add to this lack of security (and other security holes) the absurd scramble for domain name real estate and such bad behaviour as domain squatting, etc., and it's looking like a system that only system admins and crooks benefit from. Most people (including myself) would type 'acme' into Google instead of 'acme.com' into the URL bar, to give an extra level of intelligence, familiarity, trust and user interface consistency.
_________________________________________

But really it's 3. that bothers me most. Using URLs to pass human-readable strings to an application 'above' HTTP.

A transparent URL string is always a query string (whether it has a '?' or not) - in other words, it could potentially be ambiguous and return, not definitely one, but zero or many possible results. We probably get zero results when we 'hack' a URL or when the site gets reorganised. We gloss over the many-results case by returning a single page that we call 'query results'. But by allowing in zero or many resources so easily, we've loosened the Web by removing the definite 1-1 mapping of URL to resource.

Hackable URLs should not be part of a self-respecting website's user interface. We would give a better user experience if we took the URL bar away and replaced it with a 'jump to first clipboard web link' button, for those copy-paste situations. Such a button would intelligently parse the text on the clipboard for URLs and jump to the first location discovered. A good information architecture and user interaction design makes hackable URLs irrelevant.

Another problem is when people start using their knowledge of the URL structure to generate new URLs - it may be acceptable or encouraged (even prescribed in an HTML GET form), but each time it happens, we're creating a unique mini-contract - another non-standard schema. The Web thrives on URL proliferation, not on schema proliferation!

The need for URLs to be reliable - to always return what they are expected to return each time they're used - means that whatever URL schema or namespace you come up with is something you're stuck with - people or even programs may depend on it. But there's no standards body or namespace body looking after the bigger picture for you. Your mistakes may haunt you for a long time.

Also, query URLs are inherently /not/ reliable - the resource they return is /expected/ to change, which again makes their (re)-use less desirable.

Clearly, the W3C's unfortunate 'httpRange-14' issue would never have occurred with opaque URLs. In other words, opaque, semantics-free HTTP URIs are /always/ dereferencable to 'information resources' and /never/ refer to cars! Strings that are part of a car domain model belong inside /content/ not in links to content - they belong above HTTP. I'm not fully conversant in the Semantic Web domain, but I suspect that there are issues in there that are caused by mixing up globally unique identifier strings used to build information structures with strings that are semantically-meaningful over those structures, and that can dereference to sets.

So my main objection to transparent URLs is the way they mix up the mechanism for linking up the Web with a mechanism for querying it. The Web works fine using HTTP and opaque URLs. We have POST and Content-Type and OpenSearch schemas to query the Web.
_________________________________________

Practical examples..

You can return opaque links to time-ordered collections listing the latest documents to be tagged 'semweb':

<a class="tag" href="http://tagbeat.com/3720a-993117b";>semweb</a>

Keep your URLs opaque (like GUIDs in databases) and put your application data and queries in the content (like SQL queries and result sets in databases). Give your query content resources a first-class schema - see OpenSearch - and even their own URLs. POST these queries to opaque collection URLs. Make your result sets transient (returned in the POST response, thus no-cache by default). Result sets should only be 'grounded' (thus linkable and cacheable) if explicitly asked for in the query, when you should redirect to a new resource in the POST response.

Of course, you can still surround the UUID/GUID part of your opaque URLs with human-readable string decorations, as long as they're never used to dereference the resource but just for mnemonic purpose, or for search engine optimisation.
_________________________________________

I've gone on at length (again!), but hope you have had the patience to get my point of view. =0)

Cheers!

Duncan Cragg

PS I work at the Financial Times over the river from you - but I was a URL opacitist /before/ having to wrangle with the FT CMS...!



_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Reply via email to