Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Jonathan Rochkind writes: There are trade-offs. I think a lot of that TAG stuff privileges the theoretically pure over the on the ground practicalities. They've got a great fantasy in their heads of what the semantic web _could_ be, and I agree it's theoretically sound and _could_ be; but you've got to make it convenient and cheap if you actually want it to happen for real, sometimes sacrificing theoretical purity. And THAT'S one important lesson of the success of the WWW. Very true and very important. I've seen this stated most succinctly by Clay Shirky: You cannot simultaneously have mass adoption and rigor. I hope one day I can come up with eight words as pithy as that. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Good craftsmanship may not be art, but good art incorporates good craftsmanship -- Jane MacDonald.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Alexander Johannesen wrote: I think you are quite mistaken on this, but before we leap into wheter the web is suitable for SuDoc I'd rather point out that SuDoc isn't web friendly in itself, and *that* more than anything stands in the way of using them with the web. It stands in the way of using them in the fully realized sem web vision. It does NOT stand in the way of using them in many useful ways that I can and want to use them _right now_. Ways which having a URI to refer to them are MUCH helped by. Whether it can resolve or not (YOU just made the point that a URI doesn't actually need to resolve, right? I'm still confused by this having it both ways -- URIs don't need to resolve, but if you're URIs don't resolve than you're doing it wrong. Huh?), if you have a URI for a SuDoc you can use it in any infrastructure set up to accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even. You can make statements about a SuDoc if it has a URI, whether or not it resolves, whether or not SuDoc itself is 'web friendly'. One step at a time. This is my frustration with semantic web stuff, making it harder to do things that we _could_ do right here and now, because it violates a fantasy of an ideal infrastructure that we may never actually have. There are business costs, as well as technical problems, to be solved to create that ideal fantasy infrastructure. The business costs are _real_ Also, having a unified resolver for SuDoc isn't hard, can be at a fixed URL, and use a parameter for identifiers. You don't need to snoop the non-parameterized section of an URI to get the ID's ; Okay, Alex, why don't you set this up for us then? And commit to providing it persistently indefinitely? Because I don't have the resources to do that. And for the use cases I am confronted with, I don't _need_ it, any old URI, even not resolvable, will do--yes, as long as I can recognize it as a SuDoc and extract the bare SuDoc out of it. Which you say I shouldn't be doing (while others say that's a mis-reading of those docs to think I shouldn't be doing it) -- but avoiding doing that would raise the costs of my software quite a bit, and make the feature infeasible in the first place. Business costs and resources _matter_. I'm being a bit dis-ingenous here, because rsinger actually already _has_ set something like this up, using purl.org. Which isn't perfect, but it's there, so fine. I still don't even need it for what I'm doing. No it's not; if you design your system RESTfully (which, indeed, HTTP is) then the discovery part can be fast, cached, and using URI templates embedded in HTTP responses, fully flexible and fit for your purposes. Feel free to contribute code to my open source project (Umlaut) to accomplish the things I need to do in an efficient manner while making an HTTP request for every single rft_id that comes in. These URIs are _external_ URIs from third parties, I have no control over whether they are designed RESTfully or not. But you contribute the code, and it's good code, I'll be happy to use it. In the meantime, I'll continue trying to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps, not RD experiments, where we don't have complete control over the entire environment we operate in. You telling me that everything would work great _if only_ everyone in the whole world that I need to inter-operate with did things the way you say they should -- does absolutely nothing for me. And this, again, is my frustration with many of these semantic web arguments I'm hearing -- describing an ideal fantasy world that doesn't exist, but insisting we act as if it does, even if that means putting barriers in the way of actually getting things done. I'd like to actually get things done while moving bit-by-bit toward the semantic web vision. I can't if the semantic web vision insists that everything must be perfect, and disallows alternate solutions, alternate trade-offs, and alternate compromises. I don't have time for that, I'm building actual production apps with limited resources. Jonathan
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Hiya, On Thu, Apr 16, 2009 at 01:10, Jonathan Rochkind rochk...@jhu.edu wrote: It stands in the way of using them in the fully realized sem web vision. Ok, I'm puzzled. How? As the SemWeb vision is all about first-order logic over triplets, and the triplets are defined as URIs, if you can pop something into a URI you're good to go. So how is it that SuDoc doesn't fit into this, as you *can* chuck it in a URI? I said it was unfriendly to the Web, not impossible. It does NOT stand in the way of using them in many useful ways that I can and want to use them _right now_. Ah, but then go fix it. Ways which having a URI to refer to them are MUCH helped by. Whether it can resolve or not (YOU just made the point that a URI doesn't actually need to resolve, right? I'm still confused by this having it both ways -- URIs don't need to resolve, but if you're URIs don't resolve than you're doing it wrong. Huh?) C'mon, it ain't *that* hard. :) URIs as identifiers is fine, having them resolve as well is great. What's so confusing about that? , if you have a URI for a SuDoc you can use it in any infrastructure set up to accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even. You can make statements about a SuDoc if it has a URI, whether or not it resolves, whether or not SuDoc itself is 'web friendly'. One step at a time. This is my frustration with semantic web stuff, making it harder to do things that we _could_ do right here and now, because it violates a fantasy of an ideal infrastructure that we may never actually have. Huh? The people who made SuDoc didn't make it web friendly, and thus the SemWeb stuff is harder to do because it lives on the web? (And chucking your meta data into HTML as MF or RDF snippets ain't that hard, it just require a minimum of knowledge) There are business costs, as well as technical problems, to be solved to create that ideal fantasy infrastructure. The business costs are _real_ No more real than the cost currently in place. The thing is that a lot of people see the traditional cost disappear with the advent of SemWeb and the new costs heavily reduced. Also, having a unified resolver for SuDoc isn't hard, can be at a fixed URL, and use a parameter for identifiers. You don't need to snoop the non-parameterized section of an URI to get the ID's ; Okay, Alex, why don't you set this up for us then? Why? I don't give a rats bottom about SuDoc, don't need it, think it's poorly designed, and gives me nothing in life. Why should I bother? (Unless I'm given money for it, then I'll start caring ... :) And commit to providing it persistently indefinitely? Because I don't have the resources to do that. Who's behind SuDoc, and are they serious about their creation? That's the people you should send your anger instead. And for the use cases I am confronted with, I don't _need_ it, any old URI, even not resolvable, will do--yes, as long as I can recognize it as a SuDoc and extract the bare SuDoc out of it. So what's the problem with just making some stuff up? If you can do your thing in a vacuum I don't fully understand your problem with the SemWeb stuff? If you don't want it, don't use it. Which you say I shouldn't be doing (while others say that's a mis-reading of those docs to think I shouldn't be doing it) No, I think this one is the subtle difference between a URL and a URI. but avoiding doing that would raise the costs of my software quite a bit, and make the feature infeasible in the first place. Business costs and resources _matter_. As with anything on the Web, you work with what you got, and if you can fix and share your fix, we all will love you for it. I seriously don't think I understand what you're getting at here; it's been this way since the Web popped into existance, and don't really want it to be any other way. No it's not; if you design your system RESTfully (which, indeed, HTTP is) then the discovery part can be fast, cached, and using URI templates embedded in HTTP responses, fully flexible and fit for your purposes. These URIs are _external_ URIs from third parties, I have no control over whether they are designed RESTfully or not. Not sure I follow this one. There are no good or bad RESTful URIs, just URIs. REST is how your framework work with the URIs. In the meantime, I'll continue trying to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps, not RD experiments My day job is to balance functionality, maintainability, future expansion, and the programming and hardware resources available to me, same as I always do, here in the real world when we're building production apps ... and I'm using Topic Maps and SemWeb technologies. Is there something I'm doing which degrades my work to an RD experiment, something I should let my customers
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Jonatan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:27 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Hiya, Been meaning to jump into this discussion for a while, but I've been off to an alternative universe and I can't even say it's good to be back. :) Anwhoo ... On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress r...@loc.gov wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. I'm rather confused by this statement. The web: URI scheme? The Web *is* the URI scheme; they are all identifiers to resources (ftp: http: gopher: https: etc.), and together they make up, the, um, web of things. What am I missing? Back in the old days, URIs (or URLs) were protocol based. No, which one do you mean, URIs or URLs? The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Again, have I missed something? This has changed, as opposed to the good old days? A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. I've been browsing back and forth this discussion, and couldn't find much to back this up. What do you mean by this? Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. I'm sorry, but mess? Did you know the messiness of the web is probably what made it successful? Not to mention that having URIs be identifiers *and* have the ability to resolve them is a bonus; they're identifiers of things (as they've always been, as I'm sure you know URI stands for Unified Resource Identifier, right? :), as in they consists of a string of characters used to identify or name a resource on the Internet. And then, if you so choose, you can use the protocol level to *resolve* them. Not sure how anyone can consider this to be bad, though. Or is this just a misunderstanding of the difference between URIs and URLs? Kind regards, Alexander -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. And that's part of what makes it confusing, yeah. I think the sem web crowd actually embraces this confusingness, they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. By analogy, it's what gets a lot of MARC/AACR2 into trouble. It's also often a very convenient thing to do, and convenience matters. Although ironically, my problem with some of those TAG documents is actually that they privilege pure theory over practical convenience. Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Infeasible for performance and cost reasons, and infeasible because it requires a lot more development effort at BOTH ends -- it means that every single URI _would_ have to de-reference to an RDF representation capable of telling me it identifies a SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a URI is different than a URL and does not need to resolve, foll! owing the opacity recommendation/requirement would mean that resolution would be absolutely required in order for me to use it. Meaning that someone minting the URI would have to provide that infrastructure, and I as a client would have to write code to use it. But I just want a darn SuDoc in a URI -- and there are advantages to putting a SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 'opacity' reccommendation. There are trade-offs. I think a lot of that TAG stuff privileges the theoretically pure over the on the ground practicalities. They've got a great fantasy in their heads of what the semantic web _could_ be, and I agree it's theoretically sound and _could_ be; but you've got to make it convenient and cheap if you actually want it to happen for real, sometimes sacrificing theoretical purity. And THAT'S one important lesson of the success of the WWW. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Tuesday, April 14, 2009 9:48 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote: The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. Correct me if I'm wrong. Sure it exists: URLs are a subset of URIs. URLs are locators as opposed to just identifiers (which is an important distinction, much used in SemWeb lingo), where URLs are closer to the protocol like things Ray describe (or so I think). I don't entirely agree with either dogmatic side here, but I do think that we've arrived at an awfully confusing (for developers) environment. But what about it is confusing (apart from us having this discussion :) ? Is it that we have IDs that happens to *also* resolve? And why is that confusing? Re-reading the various semantic web TAG position papers people keep referencing, I actually don't entirely agree with all of their principles in practice. Well, let me just say that there's more to SemWeb than what comes out of W3C. :) Kind regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Thanks Ray. By that definition ALL http URIs are URLs, a priori. I read Alexander as trying to make a different distinction. Ray Denenberg, Library of Congress wrote: From: Jonathan Rochkind rochk...@jhu.edu The difference between URIs and URLs? I don't believe that URL is something that exists any more in any standard, it's all URIs. The URL is alive and well. The W3C definition, http://www.w3.org/TR/uri-clarification/ a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network location), rather than by some other attributes it may have. Thus as we noted, http: is a URI scheme. An http URI is a URL. SRU, for example, considers it's request to be URL. I do think this conversation has played itself out. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
The User Agent is understood to be a typical browser, or other piece of software, like wget, curl, etc. It's the thing implementing the client side of the specs. I don't think you are operating as a user agent here as much as you are a server application. That is, assuming I have any idea what you're actually doing. --Joe On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote: Am I not an agent making use of a URI who is attempting to infer properties from it? Like that it represents a SuDoc, and in particular what that SuDoc is? If this kind of talmudic parsing of the TAG reccommendations to figure out what they _really_ mean is neccesary, I stand by my statement that the environment those TAG documents are encouraging is a confusing one. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Tuesday, April 14, 2009 10:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08- 17.html They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' I understand why that makes sense in theory, but it's entirely impractical for me, as I discovered with the SuDoc experiment (which turned out to be a useful experiment at least in understanding my own requirements). If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement, but it's entirely infeasible to require me to make an individual HTTP request for every URI I find, to figure out what it IS. Jonathan, you need to take URI opacity in context. The document is correct in suggesting that user agents should not attempt to infer properties of the referenced resource. The Architecture of the Web is also clear on this point and includes an example. Just because a resource URI ends in .html does not mean that HTML will be the representation being returned. The user agent is inferring a property by looking at the end of the URI to see if it ends in .html, e.g., that the Web Document will be returning HTML. If you really want to know for sure you need to dereference it with a HEAD request. Now having said that, URI opacity applies to user agents dealing with *any* URIs that they come across in the wild. They should not try to infer any semantics from the URI itself. However, this doesn't mean that the minter of a URI cannot create a policy decision for a group of URIs under their control that contain semantics. In your example, you made a policy decision about the URIs you were minting for SUDOCs such that the actual SUDOC identifier would appear someplace in the URI. This is perfectly fine and is the basis for REST URIs, but understand you created a specific policy statement for those URIs, and if a user agent is aware of your policy statements about the URIs you mint, then they can infer semantics from the URIs you minted. Does that break URI opacity from a user agents perspective? No. It just means that those user agents who know about your policy can infer semantics from your URIs and those that don't should not infer any semantics because they don't know what the policies are, e.g., you could be returning PDF representations when the URI ends in .html, if that was your policy, and the only way for a user agent to know that is to dereference the URI with either HEAD or GET when they don't know what the policies are. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote: Can you show me where this definition of a URL vs. a URI is made in any RFC or standard-like document? From http://www.faqs.org/rfcs/rfc3986.html ; 1.1.3. URI, URL, and URN A URI can be further classified as a locator, a name, or both. The term Uniform Resource Locator (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network location). The term Uniform Resource Name (URN) has been used historically to refer to both URIs under the urn scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name. An individual scheme does not have to be classified as being just one of name or locator. Instances of URIs from any given scheme may have the characteristics of names or locators or both, often depending on the persistence and care in the assignment of identifiers by the naming authority, rather than on any quality of the scheme. Future specifications and related documentation should use the general term URI rather than the more restrictive terms URL and URN [RFC3305]. As you can see, an URI is an identifier, and a URL is a locator (mechanism for retrieval), and since a URL is a subset of an URI, you _can_ resolve URIs as well. Sure, we have a _sense_ of how the connotation is different, but I don't think that sense is actually formalized anywhere. It is, and the same stuff is documented in WikiPedia as well ; http://en.wikipedia.org/wiki/Uniform_Resource_Identifier http://en.wikipedia.org/wiki/Uniform_Resource_Locator I think the sem web crowd actually embraces this confusingness, No, I think they take it at face value; they(the URIs) are identifiers for things, and can be used for just that purpose, but they are also URLs which mean they resolve to something. What I think you're coming at is that something thing it resolves too, as *that* has no definition. But then, if you go from RDF to Topic Maps PSIs (PSIs are URIs with an extended meaning), *that* thing it resolves to indeed has a definition; it's the prose explaining what the identifier identifies, and this is the most important difference between RDF and Topic Maps (and a very subtle but important difference, too). they want to have it both ways: Oh, a URI doesn't need to resolve, it's just an opaque identifier; but you really should use http URIs for all URIs; why? because it's important that they resolve. I smell straw-man. :) But yes, they do want both, as both is in fact a friggin' smart thing to have. We all deal with identifiers all the time, in internal as external applications, so why not use an indetifier scheme that has the added bonus of adding a resolver mechanism? If you want to be stupid and lock yourself in your limited world, then using them as just identifiers is fine but perhaps a bit, well, stupid. But if you want to be smart about it, realizing that without ontological work there will *never* be proper interop, you use those identifiers and let them resolve to something. And if you're really smart, you let them resolve to either more RDF statements, or, if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :). In general, combining two functions in one mechanism is a dangerous and confusing thing to do in data design, in my opinion. Because ... ? By analogy, it's what gets a lot of MARC/AACR2 into trouble. Hmm, and I thought it was crap design that did that, coupled with poor metadata constraints and validation channels, untyped fields, poor tooling, the lack of machine understandability, and the general library idiom of not invented here. But correct me if I'm wrong. :) Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html Umm, I'd be wary to take as canon a draft with editorial notes going back 4 to 5 years that still aren't resolved. In other words, this document isn't relevant to the real world. Yet. They suggest: URI opacity 'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.' Well, as a RESTafarian I understand this argument quite well. It's about not assuming too much from the internal structure of the URI. Again, it's an identifier, not a scheme such as an URL where structure is defined. Again, for URIs, don't assume structure because at this point it isn't an URL. If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell from the URI alone that it IS a Sudoc, AND I need to be able to extract the actual SuDoc identifier from it. That completely violates their Opacity requirement I think you are quite mistaken on this, but before we leap into wheter the web is suitable for SuDoc I'd
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Well, the thing is, those sem web folks LIKE what has resulted. They think it's _good_ that http:// can be resolved with a certain protocol in some cases, but can be an arbitrary identifier untied to protocol in others. It definitely is convenient in some cases. I have mixed feelings, I don't think it's a disaster, but I'm not sure it's always a good idea. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mike Taylor [m...@indexdata.com] Sent: Thursday, April 02, 2009 2:33 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) An account that has a depressing ring of accuracy to it. Ray Denenberg, Library of Congress writes: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
No, not identical URIs. Let's say I've put a copy of the schema permanently at each of the following locations. http://www.loc.gov/standards/mods/v3/mods-3-3.xsd http://www.acme.com//mods-3-3.xsd http://www.takoma.org/standards/mods-3-3.xsd Three locations, three URIs. But the issue of redirect or even resolution is irrelevant in the use case I'm citing. I'm talking about the use of an identifier within a protocol, for the sole purpose of identifying an object that the recipient of the URI already has - or if it doesn't have it it isn't going to retrieve it, it will just fail the request. The purpose of the identifier is to enable the server to determine whether it has the schema that the client is looking for. (And by the way that should answer Ed's question about a use case.) So the server has some table of schemas, in that table is the row: [mods schema] [ URI identifying the mods schema] It recieves the SRU request: http://z3950.loc.gov:7090/voyager? version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI identifying the mods schema If the URI identifying the MODS schema in the request matches the URI in the table, then the server know what schema the client wants, and it proceeds. If there are multiple identifiers then it has to have a row in its table for each. Does that make sense? --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 2:07 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Ray, you are absolutely right. These would be bad identifiers. But let's say they're all identical (which I think is what you're saying, right?), then this just strengthens the case for indirection through a service like purl.org. Then it doesn't *matter* that all of these are different locations, there is one URI that represent the concept of what is being kept at these locations. At the end of the redirect can be some sort of 300 response that lets the client pick which endpoint is right for them -or arbitrarily chooses one for them. -Ross. On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress r...@loc.gov wrote: We do just fine minting our URIs at LC, Andy. But we do appreciate your concern. The analysis of our MODS URIs misses the point, I'm afraid. Let's forget the set I cited (bad example) and assume that the schema is replicated at several locations (geographically dispersed) all of which are planned to house the specific version permanently. The suggestion to designate one as cannonical is a good suggestion but it isn't always possible (for various reasons, possibly political). So I maintain that in this scenario you have several *location* none of which serves well as an identifier. I'm not arguing (here) that info is better than http (for this scenario) just that these are not good identifiers. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, April 01, 2009 1:21 PM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 1:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that it is a web addresss. However, it is not a location. URIs according to RFC 3986 are just tokens to identify resources. These tokens, e.g., URIs are presented to protocol mechanisms as part of the dereferencing process to locate and retrieve a representation of the resource. People see http: and assume that it means the HTTP protocol so it must be a locator. Whoever initially registered the HTTP URI scheme could have used web as the token instead and we would all be doing: web://example.org/. This is the confusion. People don't understand what RFC 3986 is saying. It makes no claim that any URI registered scheme has persistence or can be dereferenced. An HTTP URI is just a token to identify some resource, nothing more. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ray Denenberg, Library of Congress Sent: Wednesday, April 01, 2009 1:59 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) We do just fine minting our URIs at LC, Andy. But we do appreciate your concern. Sorry Ray, that statement wasn't directed at LC in particular, but was a general statement. OCLC doesn’t do any better in this area, especially with WorldCat where there are the same issues I pointed out with your examples and additional issues to boot. The point I was trying to make was *all* organizations need to have clear policies on creating, maintaining, persistence, etc. Failure to do so creates a big mess that takes time to fix, often creating headaches for those using an organizations URIs. Take for example when NISO redesigned their site and broke all the URIs to their standards. Tim Berners-Lee addresses this in his Cool URIs Don't Break article. From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Wednesday, April 01, 2009 2:07 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Ray, you are absolutely right. These would be bad identifiers. But let's say they're all identical (which I think is what you're saying, right?), then this just strengthens the case for indirection through a service like purl.org. Then it doesn't *matter* that all of these are different locations, there is one URI that represent the concept of what is being kept at these locations. At the end of the redirect can be some sort of 300 response that lets the client pick which endpoint is right for them -or arbitrarily chooses one for them. Exactly, but purl.org is just using standard HTTP protocol mechanisms which could be easily done by LC's site given Ray's examples. What is at issue is the identification of a Real World Object URI for MODS v3.3. Whether I get back an XML schema, a RelaxNG schema, etc. are just Web Documents or representations of that abstract Real World Object. What Ross did was make the PURL the Real World Object URI for MODS v3.3 and used it to redirect to the geographically distributed Web Documents, e.g., representations. LC could have just as well minted one under its own domain. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept. It just so happens that HTTP URIs are accepted by the HTTP protocol, but in the case of mailto URIs they are accepted by the SMTP protocol. However, it is important to note that a HTTP user agent, e.g., a browser, accepts both HTTP and mailto URIs. It decides that it should send the mailto URI to an SMTP user agent, e.g., Outlook, Thunderbird, etc. or it should dereference the HTTP URI with the HTTP protocol. In fact the HTTP protocol doesn't directly accept HTTP URIs. As part of the dereference process the HTTP user agent needs to break apart the HTTP URI and present it to the HTTP protocol. For example the HTTP URI: http://example.org/ becomes the HTTP protocol request: GET / HTTP/1.1 Host: example.org Think of a URI as a minted token. The New York subway mints tokens to ride the subway to get to a destination. Placing a U.S. quarter or a Boston subway token in a turn style will not allow you to pass. You must use the New York subway minted token, e.g., currency. URIs are the same. OCLC can mint HTTP URI tokens and LC can mint HTTP URI tokens, both are using the HTTP URI currency, but sending LC HTTP URI tokens, e.g., Boston subway tokens, to OCLC's Web server will most likely result in a 404, you cannot pass since OCLC's Web server only accepts OCLC tokens, e.g., New York subway tokens, that identify a resource under its control. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Mike Taylor Sent: Thursday, April 02, 2009 8:41 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. All the PURL server does is use mechanisms specified by the HTTP protocol. Any HTTP server is capable of implementing those same mechanisms. The actual PURL server is a community based service that allows people to create HTTP URIs that redirect to other URIs without having to run an actual HTTP server. If you don't like its single point of failure, then create your own in-house service using your existing HTTP server. I believe the source code for the entire PURL service is freely available and other people have taken the opportunity to run their own in-house or community based service. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew writes: I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. That is NOT what I am saying. I am saying I am suspicious of a single point of failure. Especially since the entire architecture of the Internet was (rightly IMHO) designed with the goal of avoid SPOFs. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt -- Blair P. Houghton.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. And RFC 2616 (Hypertext transfer protocol) says: The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. So what you are saying is that it's ok to use the URI for the hypertext transfer protocol in a way that ignores RFC 2616. I'm just not sure how functional that is, in the grand scheme of things. And when you say: The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. I think you are deliberating distorting the intent of the Cool URIs document. You seem to read it that *given* an http uri, here is how the protocol should respond. But in fact the Cool URIs document asks the question So the question is, what URIs should we use in RDF? and responds that one should use http URIs for the reason that: Given only a URI, machines and people should be able to retrieve a description about the resource identified by the URI from the Web. Such a look-up mechanism is important to establish shared understanding of what a URI identifies. Machines should get RDF data and humans should get a readable representation, such as HTML. The standard Web transfer protocol, HTTP, should be used. So it doesn't just say how to respond to an http URI; it says to use http URIs *because* there is a useful possible response. That's a very different statement. It is signficant that (as Mike pointed out, perhaps inadvertently) no one is using mailto: or ftp: as identifiers. That's not a coincidence. kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew writes: I have to say I am suspicious of schemes like PURL, which for all their good points introduce a single point of failure into, well, everything that uses them. That can't be good. Especially as it's run by the same compary that also runs the often-unavailable OpenURL registry. What you are saying is that you are suspicious of the HTTP protocol. That is NOT what I am saying. I am saying I am suspicious of a single point of failure. Especially since the entire architecture of the Internet was (rightly IMHO) designed with the goal of avoid SPOFs. OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. Why would I want to do this when I could just Not Use PURLs? Anyway, we're way off the subject now -- I guess if we want to argue about the utility of PURL we could get a room :-) _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ The cladistic defintion of Aves is: an unimportant offshoot of the much cooler dinosaur family which somehow managed to survive the K/T boundry intact -- Eric Lurio.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. I think several people have looked at this, but I have not heard of any progress or implementations. Andy. The California Digital Library ran the PURL software for a while, using it to mint identifiers for digital documents. It was a while back, but someone there may remember how it went. kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Thursday, April 02, 2009 10:15 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Houghton,Andrew wrote: RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. And RFC 2616 (Hypertext transfer protocol) says: The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. So what you are saying is that it's ok to use the URI for the hypertext transfer protocol in a way that ignores RFC 2616. I'm just not sure how functional that is, in the grand scheme of things. You missed the whole point that URIs, specified by RFC 3986, are just tokens that are divorced from protocols, like RFC 2616, but often work in conjunction with them to retrieve a representation of the resource defined by the URI scheme. It is up to the protocol to decide which URI schemes that it will accept. In the case of RFC 2616, there is a one-to-one relationship, today, with the HTTP URI scheme. RFC 2616 could also have said it would accept other URI schemes too or another protocol could be defined, tomorrow, that also accepts the HTTP URI scheme, causing the HTTP URI scheme to have a one-to-many relationship between its scheme and protocols that accept its scheme. And when you say: The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. I think you are deliberating distorting the intent of the Cool URIs document. You seem to read it that *given* an http uri, here is how the protocol should respond. But in fact the Cool URIs document asks the question So the question is, what URIs should we use in RDF? and responds that one should use http URIs for the reason that: Given only a URI, machines and people should be able to retrieve a description about the resource identified by the URI from the Web. Such a look-up mechanism is important to establish shared understanding of what a URI identifies. Machines should get RDF data and humans should get a readable representation, such as HTML. The standard Web transfer protocol, HTTP, should be used. The answer to the question posed in the document is based on Tim Burners-Lee four linked data principles where one of them states to use HTTP URIs. Nobody, as far as I know, has created a hypertext based system based on the URN or info URI schemes. The only hypertext based system available today is the Web which is based on the HTTP protocol that accepts HTTP URIs. So you cannot effectively accomplish linked data on the Web without using HTTP URIs. The document has an RDF / Semantic Web slant, but Tim Burners-Lee's four linked data principles say nothing about RDF or the Semantic Web. Those four principles might be more aptly named the four linked information principles for the Web. Further, the document does go on to describe how an HTTP server (an implementation of RFC 2616) should respond to requests for Real World Object, Generic Documents and Web Documents which is based on the W3C TAG decisions for httpRange-14 and genericResources-53. The scope of the document clearly says: This document is a practical guide for implementers of the RDF specification... It explains two approaches for RDF data hosted on HTTP servers... Section 2.1 discusses HTTP and content negotiation for Generic Documents. Section 4 discusses how the HTTP server should respond with diagrams and actual HTTP status codes to let user agents know which URIs are Real World Objects vs. Generic Document and Web Documents, per the W3 TAG decisions on httpRange-14 and genericResources-53. Section 6 directly address the question that this thread has been talking about, namely using new URI schemes, like URN and info and why they are not acceptable in the context of linked data. And here is a quote which is what I have said over and over again about URI being tokens and divorced from protocols: To be truly useful, a new scheme must be accompanied by a protocol defining how to access more information about the identified resource. For example, the ftp:// URI scheme identifies resources (files on an FTP server), and also comes with a protocol for accessing them (the FTP protocol). Some of the new URI schemes provide no such protocol at all. Others provide a Web Service that allows retrieval of descriptions using the HTTP protocol. The identifier is passed to the service, which looks up
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Karen Coyle writes: OK, good, then if you are concerned about the PURL services SPOF, take the freely available PURL software and created a distributed PURL based system and put it up for the community. I think several people have looked at this, but I have not heard of any progress or implementations. The California Digital Library ran the PURL software for a while, using it to mint identifiers for digital documents. It was a while back, but someone there may remember how it went. Wait, what? They _were_ running a PURL resolver, but now they're not? What does the P in PURL stand for again? _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ Wagner's music is nowhere near as bad as it sounds -- Mark Twain.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept. It just so happens that HTTP URIs are accepted by the HTTP protocol, but in the case of mailto URIs they are accepted by the SMTP protocol. However, it is important to note that a HTTP user agent, e.g., a browser, accepts both HTTP and mailto URIs. It decides that it should send the mailto URI to an SMTP user agent, e.g., Outlook, Thunderbird, etc
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
An account that has a depressing ring of accuracy to it. Ray Denenberg, Library of Congress writes: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. --Ray - Original Message - From: Houghton,Andrew hough...@oclc.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, April 02, 2009 9:41 AM Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 2:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) This really puzzles me, because I thought http referred to a protocol: hypertext transfer protocol. And when you put http://; in front of something you are indicating that you are sending the following string along to be processed by that protocol. It implies a certain application over the web, just as mailto:; implies a particular application. Yes, http is the URI for the hypertext transfer protocol. That doesn't negate the fact that it indicates a protocol. RFC 3986 (URI generic syntax) says that http: is a URI scheme not a protocol. Just because it says http people make all kinds of assumptions about type of use, persistence, resolvability, etc. As I indicated in a prior message, whoever registered the http URI scheme could have easily used the token web: instead of http:. All the URI scheme in RFC 3986 does is indicate what the syntax of the rest of the URI will look like. That's all. You give an excellent example: mailto. The mailto URI scheme does not imply a particular application. It is a URI scheme with a specific syntax. That URI is often resolved with the SMTP (mail) protocol. Whoever registered the mailto URI scheme could have specified the token as smtp: instead of mailto:;. My reading of Cool URIs is that they use the protocol, not just the URI. If they weren't intended to take advantage of http then W3C would have used something else as a URI. Read through the Cool URIs document and it's not about identifiers, it's all about using the *protocol* in service of identifying. Why use http? I'm assuming here when you say My reading of Cool URIs... means reading the Cool URIs for the Semantic Web document and not the Cool URIs Don't Change document. The Cool URIs for the Semantic Web document is about linked data. Tim Burners-Lee's four linked data priciples state: 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things. (2) is an important aspect to linking. The Web is a hypertext based system that uses HTTP URIs to identify resources. If you want to link, then you need to use HTTP URIs. There is only one protocol, today, that accepts HTTP URIs as currency and its appropriately called HTTP and defined by RFC 2616. The Cool URIs for the Semantic Web document describes how an HTTP protocol implementation (of RFC 2616) should respond to a dereference of an HTTP URI. Its important to understand the URIs are just tokens that *can* be presented to a protocol for resolution. Its up to the protocol to define the currency that it will accept, e.g., HTTP URIs, and its up to an implementation of the protocol to define the tokens of that currency that it will accept
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Hi Ray - At Thu, 2 Apr 2009 13:48:19 -0400, Ray Denenberg, Library of Congress wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. Well, the original concept of the ‘web’ was, as I understand it, to bring together all the existing protocols (gopher, ftp, etc.), with the new one in addition (HTTP), with one unifying address scheme, so that you could have this ‘web browser’ that you could use for everything. So web: would have been nice, but probably wouldn’t have been accepted. As it turns out, HTTP won overwhelmingly, and the older protocols died off. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. Not knowing the details of the history, your account seems correct to me, except that I don’t think the web people tried to alter history. I think of the web of having been a learning experience for all of us. Yes, we used to think that the URI was tied to the protocol. But we have learned that it doesn’t need to be, that HTTP URIs can be just identifiers which happen to be dereferencable at the moment using the HTTP protocol. And it became useful to begin identifying lots of things, people and places and so on, using identifiers, and it also seemed useful to use a protocol that existed (HTTP), instead of coming up with the Person-Metadata Transfer Protocol and inventing a new URI scheme (pmtp://...) to resolve metadata about persons. Because HTTP doesn’t care what kind of data it is sending down the line; it can happily send metadata about people. But that is how things grow; the http:// at the beginning of a URI may eventually be a spandrel, when HTTP is dead and buried. And people will wonder why the address http://dx.doi.org/10./xxx has those funny characters in front of it. And doi.org will be long gone, because they ran out of money, and their domain was taken over by squatters, so we all had to agree to alter our browsers to include an override to not use DNS to resolve the dx.doi.org domain but instead point to a new, distributed system of DOI resolution. We will need to fix these problems as they arise. In my opinion, if we are interested in identifier persistent, clarity about the difference between things and information about things, creating a more useful web (of data), and the other things we ought to be interested in, our time is best spent worrying about these things, and how they can be built on top of the web. Our time is not well spent in coming up with new ways to do things that web already does for us. For instance: if there is concern that HTTP URIs are not seen as being persistent, it would be useful to try to add a method to HTTP which indicated the persistence of an identifier. This way browsers could display a little icon that indicated that the URI was persistent. A user could click on this icon and get information about the institution which claimed persistence for the URI, what the level of support was, what other institution could back up that claim, etc. Our time would not be well spent coming up with an elaborate scheme for phttp:// URIs, creating a better DNS, with name control by a better institution, and a better HTTP, with metadata, and a better caching system, and so on. This is a lot of work and you forget what you were trying to do in the first place, which is make HTTP URIs persistent. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpOEgu0KFRiA.pgp Description: PGP signature
[CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Houghton,Andrew wrote: Lets separate your argument into two pieces. Identification and resolution. The DOI is the identifier and it inherently doesn't tie itself to any resolution mechanism. So creating an info URI for it is meaningless, it's just another alias for the DOI. I can create an HTTP resolution mechanism for DOI's by doing: http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x or http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x since the info URI contains the natural DOI identifier, wrapping it in a URI scheme has no value when I could have used the DOI identifier directly, as in the first HTTP resolution example. I disagree that wrapping it in a URI scheme has no value. We have very much software and schemas that are built to store URIs, even if they don't know what the URI is or what can be done with it, we have infrastructure in place for dealing with URIs. So there is value in wrapping a 'natural' identifier in a URI, even if that URI does not carry it's own resolution mechanism with it. I have run into this in several places in my own work. I share Mike's concerns about tying resolution to identification in one mechanism. As a sort of general principle or 'pattern' or design, trying to make one mechanism do two jobs at once is a 'bad smell'. It's in fact (I hope this isn't too far afield) how I'd sum up much of the failure of AACR2/MARC, involving our 'controlled headings' (see me expanding on this in some blog posts at http://bibwild.wordpress.com/2008/01/17/identifiers-and-display-labels-again/). On the other hand, it is awfully _convenient_ to combine these two functions in one mechanism. And convenience does matter too. I can see both sides. So I think we just do what feels right, and when we all disagree on what feels right, we pick one. I don't share the opinion of those who think it's obvious that everything should be an http uri, nor do I share the opinion of those who think it's obvious that this is a disaster. DOI is definitely one good example of where One Canonical Resolution fails. The DOI _resolution_ system fails for me -- it does not reliably or predictably deliver the right document for my users. But a DOI as an identifier is still useful for me. Even if that DOI were expressed in a URI as http://dx.doi.org/resolve/10./j.1475-4983.2007.00728.x, I STILL wouldn't actually use the HTTP server at dx.doi.org to resolve it. I'd extract the actual DOI out of it, and use a different resolution mechanism. Another example to think about is what happens when the protocol for resolution changes? Right now already we could find a resolution service starting to make available and/or insist upon https protocol resolution. But all those existing identifiers expressed as http URIs should not change, they are meant to be persistent. So already it's possible for an identifier originally intended to describe it's own resolution to be slightly wrong. Is this confusing? In the future, maybe we'll have something different than http entirely. Jonathan
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
I admit that httprange-14 still confuses me. (I have no idea why it's called httprange-14 for one thing). But how do you identify the URI as being a Real World Object? I don't understand what it entails. And http://doi.org/*; describes it's own type only to software that knows what a URI beginning http://doi.org means, right? What about Eric Hellman's point that there are a variety of possible http URIs (not just possible but _in use_) that encapsulate a DOI, and given software would have to know all of the possible templates (with more being created all the time)? Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 01, 2009 11:08 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) Houghton,Andrew wrote: Lets separate your argument into two pieces. Identification and resolution. The DOI is the identifier and it inherently doesn't tie itself to any resolution mechanism. So creating an info URI for it is meaningless, it's just another alias for the DOI. I can create an HTTP resolution mechanism for DOI's by doing: http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x or http://resolve.example.org/?uri=info:doi/10./j.1475- 4983.2007.00728.x since the info URI contains the natural DOI identifier, wrapping it in a URI scheme has no value when I could have used the DOI identifier directly, as in the first HTTP resolution example. I disagree that wrapping it in a URI scheme has no value. We have very much software and schemas that are built to store URIs, even if they don't know what the URI is or what can be done with it, we have infrastructure in place for dealing with URIs. Oops... that should have read ... wrapping it in an unresolvable URI scheme... The point being that: urn:doi:* info:doi:* provide no advantages over: http://doi.org/* when, per W3C TAG httpRange-14 decision you identify the URI as being a Real World Object. When identifying the HTTP URI as a Real World Object, it is the same as what Mike said about the info URI that: the identifier describes its own type. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 1, 2009 at 11:37 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I admit that httprange-14 still confuses me. (I have no idea why it's called httprange-14 for one thing). http://www.w3.org/2001/tag/group/track/issues/14 Some background: http://efoundations.typepad.com/efoundations/2009/02/httprange14-cool-uris-frbr.html And http://doi.org/*; describes it's own type only to software that knows what a URI beginning http://doi.org means, right? How is that different from the software knowing what info:doi/ means? The difference is, how much more software knows what http: means vs. info:? And this, I think, has got to be point here. How many times do we need to marginalize ourselves with our ideals and expectations that nobody else adheres to before we're rendered completely irrelevant? Doesn't it make sense to coopt the mainstream processes and apply them to our ideals? What, exactly, is the resistance here? What about Eric Hellman's point that there are a variety of possible http URIs (not just possible but _in use_) that encapsulate a DOI, and given software would have to know all of the possible templates (with more being created all the time)? Right, but here again is where we're talking about the difference between a location and the identifier. We're talking about establishing http://dx.doi.org/10./j.1475-4983.2007.00728.x (or something like that -- http://hdl.handle.net/10./j.1475-4983.2007.00728.x might be more appropriate) as the identifier for doi:10./j.1475-4983.2007.00728.x That you can access it via http://doi.wiley.com/10./j.1475-4983.2007.00728.x (or resolve it there) doesn't mean that that's the identifier for it. -Ross.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Houghton,Andrew hough...@oclc.org The point being that: urn:doi:* info:doi:* provide no advantages over: http://doi.org/* I think they do. I realize this is pretty much a dead-end debate as everyone has dug themselves into a position and nobody is going to change their mind. It is a philosophical debate and there isn't a right answer. But in my opinion I won't use the doi example because it's overloaded. Let's talk about the hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: http://sudoc.org/xyz if the latter is not going to resolve. Why? Because it drives me nuts to see http URIs everywhere that give all appearances of resolvability - browsers, editors, etc. turn them into clickable links. Now, if you are setting up a resolution service where you get the document that the sudoc identifies when you click on the URI, then http is appropriate. The *actual document*. Not a description of it in lieu of the document. And the so-called architectural justification that it's ok to return metadata instead of the resource (representation) -- I don't buy it. --Ray
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Ross Singer wrote: On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org No, *I* don't. And neither does my email program, since it displayed it as a URL (blue and underlined). That's inside knowledge, not part of the technology. Someone COULD create a web site at that address, and there's nothing in the URI itself to tell me if it's a URI or a URL. The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. John Kunze's ARK system attempted to work around this by using http to retrieve information about the URI, so you're not just left guessing. It's not a question of resolution, but of giving you a short list of things that you can learn about a URI that begins with http. However, again, unless you know the secret you have no idea that those particular URI/Ls have that capability. So again we're going beyond the technology into some human knowledge that has to be there to take advantage of the capabilities. It doesn't seem so far fetched to make it possible for programs (dumb, dumb programs) to know the difference between an identifier and a location based on something universal, like a prefix, without having to be coded for dozens or hundreds of exceptions. kc I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross. -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Wednesday, April 01, 2009 1:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?) The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that it is a web addresss. However, it is not a location. URIs according to RFC 3986 are just tokens to identify resources. These tokens, e.g., URIs are presented to protocol mechanisms as part of the dereferencing process to locate and retrieve a representation of the resource. People see http: and assume that it means the HTTP protocol so it must be a locator. Whoever initially registered the HTTP URI scheme could have used web as the token instead and we would all be doing: web://example.org/. This is the confusion. People don't understand what RFC 3986 is saying. It makes no claim that any URI registered scheme has persistence or can be dereferenced. An HTTP URI is just a token to identify some resource, nothing more. Andy.
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
My point is that I don't see how they're different in practice. And one of them actually allowed you to do something from your email client. -Ross. On Wed, Apr 1, 2009 at 1:20 PM, Karen Coyle li...@kcoyle.net wrote: Ross, I don't get your point. My point was about the confusion between two things that begin: http:// but that are very different in practice. What's yours? kc Ross Singer wrote: Your email client knew what do with: info:doi/10./j.1475-4983.2007.00728.x ? doi:10./j.1475-4983.2007.00728.x ? Or did you recognize the info:doi scheme and Google it? Or would this, in case of 99% of the world, just look like gibberish or part of some nerd's PGP key? -Ross. On Wed, Apr 1, 2009 at 1:06 PM, Karen Coyle li...@kcoyle.net wrote: Ross Singer wrote: On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote: But shouldn't we be able to know the difference between an identifier and a locator? Isn't that the problem here? That you don't know which it is if it starts with http://. But you do if it starts with http://dx.doi.org No, *I* don't. And neither does my email program, since it displayed it as a URL (blue and underlined). That's inside knowledge, not part of the technology. Someone COULD create a web site at that address, and there's nothing in the URI itself to tell me if it's a URI or a URL. The general convention is that http://; is a web address, a location. I realize that it's also a form of URI, but that's a minority use of http. This leads to a great deal of confusion. I understand the desire to use domain names as a way to create unique, managed identifiers, but the http part is what is causing us problems. John Kunze's ARK system attempted to work around this by using http to retrieve information about the URI, so you're not just left guessing. It's not a question of resolution, but of giving you a short list of things that you can learn about a URI that begins with http. However, again, unless you know the secret you have no idea that those particular URI/Ls have that capability. So again we're going beyond the technology into some human knowledge that has to be there to take advantage of the capabilities. It doesn't seem so far fetched to make it possible for programs (dumb, dumb programs) to know the difference between an identifier and a location based on something universal, like a prefix, without having to be coded for dozens or hundreds of exceptions. kc I still don't see the difference. The same logic that would be required to parse and understand the info: uri scheme could be used to apply towards an http uri scheme. -Ross. -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234 -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234