Re: URL Theory & Best Practices

Miles Elam Sat, 09 Nov 2002 12:09:04 -0800

Tony Collen wrote:

Comments inline...
Miles Elam wrote:
But can't delivered types differ by the incoming client?
Yes, but a problem then arises when someone is using IE and they want a PDF, when your user-agent rules will only serve a PDF for FooCo PDF Browser 1.0. IMO browsers should respect the mime-type header. I believe the mime-type headers is very useful when you want to use something like a PHP script to send an image or a .tar.gz file. In fact, it's essential for it to work, otherwise the browser interprets the data as garbage.

No, that's wasn't my intention at all. If someone is using IE and they want a pdf (not a default expectation for that particular browser like html or xml), then the URL they would get directed to would be *.pdf. This is not the intrinsic resource. You are explicitly asking for the PDF representation of that resource.

If the browser's default expectation is PDF (like in your FooCo PDF Browser 1.0 example), the trailing slash resource would give it PDF. However, it could still be pointed to *.pdf if you wanted to make it explicit.

In those cases where only PDF is available (common when it's not dynamically generated), I see no reason why the URI wouldn't be *.pdf. In fact, if in the future more presentation types are added, a special case for *.pdf to return a static resource and all other variations being dynamically generated (or some other mixing and matching) would still be valid and a stable URI space.

As far as a php script returning an image, that's fine, but if the URL ends with (or even contains) any reference to "php", you are tying your URI to a particular technology/delivery method. With Cocoon, why not map /foo/bar/alpha.png to the PHP script that returns a PNG image? In this case, I'm not advocating the trailing slash. I am advocating that you not have PHP even mentioned in the URL. In this case, the resource is a PNG image without regard to client -- have the URL reflect this.

This is where we differ slightly. In my mind /a/b/ is the intrinsic resource. /a/b/index.html is the explicit call for HTML represention of /a/b/. If you redirect a client to /a/b/index.html and the client bookmarks it, they are bookmarking the HTML representation, not the intrinsic resource. I understand the efficiency issues, but a user agent match when viewed in the context of sitemap matches, server-side logic, servlet request and response object creation and other assorted methods calls is just a couple of string comparisons.

This is pretty much the original problem I was trying to solve. Sure, having a clean URL space that always ends in a / is useful, but if you look at how that would work on the server, side, it means you create a physical directory for each page and then create an index.html. You have tons of files named index.html on your web server, but at least it's all organized with the directories.

Hmmm... Why is it that your physical directory structure must have ANYTHING to do with the URL? This flies right in the face of the reason for Cocoon's sitemap and the resources made available from Apache's httpd.conf. You would indeed have many URLs that point to a resource called index.html, but your filesystem need not have any. Your filesystem could be flat without any directories at all. It could be replaced with a database. ...or LDAP or xmldb or PHP...

If your filesystem is to be 1:1 with your URLs, why use Cocoon and a servlet engine at all? A flat file webserver would serve things much faster. The reason I want to use Cocoon is that it makes things *better* and not faster -- although I have methods for getting extra speed.

In my opinion, URLs should not change.
As further explained at http://www.useit.com/alertbox/990321.html The rundown:

- URLs should not change
- URLs are easy to remember (and therefore are organized logically)
- URLs are easy to type and are generally all in lowercase

That is one of the main things that drew me to Cocoon: URI abstraction. Once the URL is abstracted enough to act as a true URI, it can start acting as a true indentifier instead of an ad hoc, vague gobbledygook. Of course this also assumes that the URL/URI remains set in stone and not a moving target.

Yes! This is exactly the conclusion I was coming to on my own. URIs are no more than data abstractions. They usually provide a view to some data, and more often than not, a URL on a web server directly correlates with a physical file on a disk (e.g. index.html). Cocoon allows one to create a purely virtual URL space in which no real files on the server could exist. It probably doesn't matter how the underlying data is abstracted, whether it be a one-to-one correlation to a directory tree on a disk somewhere, or an xpath statement into an xml file, or arguments to a CGI script that accesses a database depending on the order of the items in the request. Imagine a request for /articles/bydate/2002/10/31/ mapping to articles.php?mode=bydate&year=2002&month=10&day=31, which in turn queries a database. Accessing a URL can provide a default view of the data, and depending on the request, the data can be presented different ways. In the case of things like PHP and CGI scripts, the URL sometimes accepts incoming data (GET or POST data) and will return different results based on the messages passed to it. Cocoon allows you to provide different views of a resource based on the User-Agent string which is supplied by the browser. URLs represent objects.

We are in agreement.

This way the extension isn't revealing the underlying technology of the site, but the type of file the client is expecting, and this goes for directories too.
If all we're really serving up is data, and XML is "just data" (http://radio.weblogs.com/0101679/), then perhaps all of our matches should match for *.xml. Based on other things, like the User-Agent string, or request parameters, we can provide different views of the data (PDF, SVG, HTML etc). A page named "foo.xml" could be an instance of intelligent data, whereby Cocoon supplies the "smarts" to change the data depending on any number of conditions. In the end, it probably doesn't matter how the data is abstracted, as long as it's consistent, easy to use, and is mostly permanent (or rather, will be flexible if the abstraction changes in the future)

Life will be so much easier in 5 years when we're just serving up straight up xml files. Unfortunately this puts Cocoon out of business ;)

No, a match for *.xml would be a request for the XML *representation* of the resource. XML is not intrinsic. It may be the starting point for Cocoon's pipelines. It may be the contracts all through the pipelines. It may be a format that can represent the semantic meaning behind a resource. It is not the resource.

All of your matchers may indeed be for *.xml if your client base fits what you are serving. Still, that's not the intrinsic resource. Your starting point in a pipeline could be a simple, tab-delimited text file and export it as XML. Plain text is still not the intrinsic resource. The intrinsic resource is the information. Period. As soon as it's serialized in some format, as soon as it is marked up, as soon as it is generated, it ceases to be pure information -- an intrinsic resource. This is the point I am trying to drive home. An intrinsic resource can never be what people see. You might as well try to draw a picture of someone's brain to illustrate what they know.

It may be that clients in the future are all XML/XSLT/XInclude/XForms capable. Doesn't change much for Cocoon. The only way that serving XML files might kill Cocoon is if there was no dynamic data. With the sheer volume of information today let alone tomorrow, I don't see that happening.

Life won't be easier in five years; It'll be the same with different trappings. Accessing information may be easier in five years though as long as people try to make it more accessible.

- Miles

P.S. Thank god for the mailing lists. They actually encourages me to write down some of my thoughts. Even they are off the mark more often than not... Does this make email better than web or simply justify the need for more discussion on the web?

---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail: <[EMAIL PROTECTED]>

Re: URL Theory & Best Practices

Reply via email to