Re: [uf-discuss] class="tag"

Duncan Cragg Tue, 01 Jul 2008 13:59:23 -0700

Ciaran McNulty wrote:

On Sun, Jun 29, 2008 at 3:07 PM, Duncan Cragg <[EMAIL PROTECTED]> wrote:

Those of us who favour opaque URLs (actually for practical reasons such as
clean separation of concerns, maintainability, etc.) are unhappy with being
forced into a semantic URL schema when using rel-tag.

Can you go into a bit more detail, or point to a resource explaining
the benefits of opaque URLs?  It's something I've not come across
before and I'd be intrigued to see the reasons behind it.

I'll do both. Here's a resource explaining it - I addressed the subjectin this blog post:


http://duncan-cragg.org/blog/post/content-types-and-uris-rest-dialogues/

That is a very transparent URL (see: I'm not obsessive about it!).

The trouble with my URL is that it mixes three concerns:

1. making a connection to my server and kicking off HTTP
2. identifying a resource (with a completely opaque string) within HTTP
3. kicking off some Python code with an argument string

It's 1. and 3. I'm talking about. URLs are already opaque to HTTP.

As soon as you allow in syntax or schema in URLs - as soon as you startusing anything other than long random numbers - you've got a problem ofnamespace allocation and schema standardisation. I refer to "Zooko'sTriangle" on my blog's right rail which discusses the trade-off betweenglobal uniqueness, security and memorability.

_________________________________________

On 1.: Unless you're running fancy P2P algorithms, it's hard to argueagainst putting a big hint in the URL to say where to go to find theresource. But don't forget that you needn't go to that server - youcould ask an intermediary proxy - which is kind of a simplistic P2Palgorithm...However, there is a case for arguing that DNS has been a failure: itisn't any more easy to type a URL when you know you have to be soprecise to avoid scam sites. And it isn't any easier to use it toidentify a site when you have to avoid the likes ofwww.yahoo.com.baddies.com or www.google.randomtld . You may as well onlyuse IP addresses; as hard to type and as useless to read. Most programscome with a copy-paste function to save some typing...

Add to this lack of security (and other security holes) the absurdscramble for domain name real estate and such bad behaviour as domainsquatting, etc., and it's looking like a system that only system adminsand crooks benefit from.Most people (including myself) would type 'acme' into Google instead of'acme.com' into the URL bar, to give an extra level of intelligence,familiarity, trust and user interface consistency.

_________________________________________

But really it's 3. that bothers me most. Using URLs to passhuman-readable strings to an application 'above' HTTP.

A transparent URL string is always a query string (whether it has a '?'or not) - in other words, it could potentially be ambiguous and return,not definitely one, but zero or many possible results. We probably getzero results when we 'hack' a URL or when the site gets reorganised. Wegloss over the many-results case by returning a single page that we call'query results'. But by allowing in zero or many resources so easily,we've loosened the Web by removing the definite 1-1 mapping of URL toresource.

Hackable URLs should not be part of a self-respecting website's userinterface. We would give a better user experience if we took the URL baraway and replaced it with a 'jump to first clipboard web link' button,for those copy-paste situations. Such a button would intelligently parsethe text on the clipboard for URLs and jump to the first locationdiscovered. A good information architecture and user interaction designmakes hackable URLs irrelevant.

Another problem is when people start using their knowledge of the URLstructure to generate new URLs - it may be acceptable or encouraged(even prescribed in an HTML GET form), but each time it happens, we'recreating a unique mini-contract - another non-standard schema. The Webthrives on URL proliferation, not on schema proliferation!

The need for URLs to be reliable - to always return what they areexpected to return each time they're used - means that whatever URLschema or namespace you come up with is something you're stuck with -people or even programs may depend on it. But there's no standards bodyor namespace body looking after the bigger picture for you. Yourmistakes may haunt you for a long time.

Also, query URLs are inherently /not/ reliable - the resource theyreturn is /expected/ to change, which again makes their (re)-use lessdesirable.

Clearly, the W3C's unfortunate 'httpRange-14' issue would never haveoccurred with opaque URLs. In other words, opaque, semantics-free HTTPURIs are /always/ dereferencable to 'information resources' and /never/refer to cars! Strings that are part of a car domain model belong inside/content/ not in links to content - they belong above HTTP. I'm notfully conversant in the Semantic Web domain, but I suspect that thereare issues in there that are caused by mixing up globally uniqueidentifier strings used to build information structures with stringsthat are semantically-meaningful over those structures, and that candereference to sets.

So my main objection to transparent URLs is the way they mix up themechanism for linking up the Web with a mechanism for querying it. TheWeb works fine using HTTP and opaque URLs. We have POST and Content-Typeand OpenSearch schemas to query the Web.

_________________________________________

Practical examples..

You can return opaque links to time-ordered collections listing thelatest documents to be tagged 'semweb':


<a class="tag" href="http://tagbeat.com/3720a-993117b";>semweb</a>

Keep your URLs opaque (like GUIDs in databases) and put your applicationdata and queries in the content (like SQL queries and result sets indatabases). Give your query content resources a first-class schema - seeOpenSearch - and even their own URLs. POST these queries to opaquecollection URLs. Make your result sets transient (returned in the POSTresponse, thus no-cache by default). Result sets should only be'grounded' (thus linkable and cacheable) if explicitly asked for in thequery, when you should redirect to a new resource in the POST response.

Of course, you can still surround the UUID/GUID part of your opaque URLswith human-readable string decorations, as long as they're never used todereference the resource but just for mnemonic purpose, or for searchengine optimisation.

_________________________________________

I've gone on at length (again!), but hope you have had the patience toget my point of view. =0)


Cheers!

Duncan Cragg

PS I work at the Financial Times over the river from you - but I was aURL opacitist /before/ having to wrangle with the FT CMS...!




_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] class="tag"

Reply via email to