Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-16 Thread Mike Taylor
Jonathan Rochkind writes:
  There are trade-offs.  I think a lot of that TAG stuff privileges
  the theoretically pure over the on the ground practicalities.
  They've got a great fantasy in their heads of what the semantic web
  _could_ be, and I agree it's theoretically sound and _could_ be;
  but you've got to make it convenient and cheap if you actually want
  it to happen for real, sometimes sacrificing theoretical purity.
  And THAT'S one important lesson of the success of the WWW.

Very true and very important.  I've seen this stated most succinctly
by Clay Shirky:

You cannot simultaneously have mass adoption and rigor.

I hope one day I can come up with eight words as pithy as that.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Good craftsmanship may not be art, but good art incorporates
 good craftsmanship -- Jane MacDonald.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-15 Thread Jonathan Rochkind

Alexander Johannesen wrote:


I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd rather point out that SuDoc isn't
web friendly in itself, and *that* more than anything stands in the
way of using them with the web.

It stands in the way of using them in the fully realized sem web vision.

It does NOT stand in the way of using them in many useful ways that I 
can and want to use them _right now_. Ways which having a URI to refer 
to them are MUCH helped by. Whether it can resolve or not (YOU just made 
the point that a URI doesn't actually need to resolve, right? I'm still 
confused by this having it both ways -- URIs don't need to resolve, but 
if you're URIs don't resolve than you're doing it wrong. Huh?), if you 
have a URI for a SuDoc you can use it in any infrastructure set up to 
accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like 
RDF even.  You can make statements about a SuDoc if it has a URI, 
whether or not it resolves, whether or not SuDoc itself is 'web 
friendly'.  One step at a time.


This is my frustration with semantic web stuff, making it harder to do 
things that we _could_ do right here and now, because it violates a 
fantasy of an ideal infrastructure that we may never actually have.


There are business costs, as well as technical problems, to be solved to 
create that ideal fantasy infrastructure. The business costs are _real_



 Also, having a unified resolver for
SuDoc isn't hard, can be at a fixed URL, and use a parameter for
identifiers. You don't need to snoop the non-parameterized section of
an URI to get the ID's ;
  
Okay, Alex, why don't you set this up for us then? And commit to 
providing it persistently indefinitely? Because I don't have the 
resources to do that.  And for the use cases I am confronted with, I 
don't _need_ it, any old URI, even not resolvable, will do--yes, as long 
as I can recognize it as a SuDoc and extract the bare SuDoc out of it. 
Which you say I shouldn't be doing (while others say that's a 
mis-reading of those docs to think I shouldn't be doing it) -- but 
avoiding doing that would raise the costs of my software quite a bit, 
and make the feature infeasible in the first place. Business costs and 
resources _matter_.


I'm being a bit dis-ingenous here, because rsinger actually already 
_has_ set something like this up, using purl.org. Which isn't perfect, 
but it's there, so fine. I still don't even need it for what I'm doing.




No it's not; if you design your system RESTfully (which, indeed, HTTP
is) then the discovery part can be fast, cached, and using URI
templates embedded in HTTP responses, fully flexible and fit for your
purposes.
  


Feel free to contribute code to my open source project (Umlaut) to 
accomplish the things I need to do in an efficient manner while making 
an HTTP request for every single rft_id that comes in.  These URIs are 
_external_ URIs from third parties, I have no control over whether they 
are designed RESTfully or not.  But you contribute the code, and it's 
good code, I'll be happy to use it.


In the meantime, I'll continue trying to balance functionality, 
maintainability, future expansion, and the programming and hardware 
resources available to me, same as I always do, here in the real world 
when we're building production apps, not RD experiments, where we don't 
have complete control over the entire environment we operate in. You 
telling me that everything would work great _if only_ everyone in the 
whole world that I need to inter-operate with did things the way you say 
they should -- does absolutely nothing for me. 

And this, again, is my frustration with many of these semantic web 
arguments I'm hearing -- describing an ideal fantasy world that doesn't 
exist, but insisting we act as if it does, even if that means putting 
barriers in the way of actually getting things done.  I'd like to 
actually get things done while moving bit-by-bit toward the semantic web 
vision. I can't if the semantic web vision insists that everything must 
be perfect, and disallows alternate solutions, alternate trade-offs, and 
alternate compromises. I don't have time for that, I'm building actual 
production apps with limited resources.


Jonathan


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-15 Thread Alexander Johannesen
Hiya,

On Thu, Apr 16, 2009 at 01:10, Jonathan Rochkind rochk...@jhu.edu wrote:
 It stands in the way of using them in the fully realized sem web vision.

Ok, I'm puzzled. How? As the SemWeb vision is all about first-order
logic over triplets, and the triplets are defined as URIs, if you can
pop something into a URI you're good to go. So how is it that SuDoc
doesn't fit into this, as you *can* chuck it in a URI? I said it was
unfriendly to the Web, not impossible.

 It does NOT stand in the way of using them in many useful ways that I can
 and want to use them _right now_.

Ah, but then go fix it.

 Ways which having a URI to refer to them
 are MUCH helped by. Whether it can resolve or not (YOU just made the point
 that a URI doesn't actually need to resolve, right? I'm still confused by
 this having it both ways -- URIs don't need to resolve, but if you're URIs
 don't resolve than you're doing it wrong. Huh?)

C'mon, it ain't *that* hard. :) URIs as identifiers is fine, having
them resolve as well is great. What's so confusing about that?

 , if you have a URI for a
 SuDoc you can use it in any infrastructure set up to accept, store, and
 relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even.  You can make
 statements about a SuDoc if it has a URI, whether or not it resolves,
 whether or not SuDoc itself is 'web friendly'.  One step at a time.

 This is my frustration with semantic web stuff, making it harder to do
 things that we _could_ do right here and now, because it violates a fantasy
 of an ideal infrastructure that we may never actually have.

Huh? The people who made SuDoc didn't make it web friendly, and thus
the SemWeb stuff is harder to do because it lives on the web? (And
chucking your meta data into HTML as MF or RDF snippets ain't that
hard, it just require a minimum of knowledge)

 There are business costs, as well as technical problems, to be solved to
 create that ideal fantasy infrastructure. The business costs are _real_

No more real than the cost currently in place. The thing is that a lot
of people see the traditional cost disappear with the advent of SemWeb
and the new costs heavily reduced.

  Also, having a unified resolver for
 SuDoc isn't hard, can be at a fixed URL, and use a parameter for
 identifiers. You don't need to snoop the non-parameterized section of
 an URI to get the ID's ;

 Okay, Alex, why don't you set this up for us then?

Why? I don't give a rats bottom about SuDoc, don't need it, think it's
poorly designed, and gives me nothing in life. Why should I bother?
(Unless I'm given money for it, then I'll start caring ... :)

 And commit to providing
 it persistently indefinitely? Because I don't have the resources to do that.

Who's behind SuDoc, and are they serious about their creation? That's
the people you should send your anger instead.

  And for the use cases I am confronted with, I don't _need_ it, any old URI,
 even not resolvable, will do--yes, as long as I can recognize it as a SuDoc
 and extract the bare SuDoc out of it.

So what's the problem with just making some stuff up? If you can do
your thing in a vacuum I don't fully understand your problem with the
SemWeb stuff? If you don't want it, don't use it.

 Which you say I shouldn't be doing
 (while others say that's a mis-reading of those docs to think I shouldn't be
 doing it)

No, I think this one is the subtle difference between a URL and a URI.

 but avoiding doing that would raise the costs of my software
 quite a bit, and make the feature infeasible in the first place. Business
 costs and resources _matter_.

As with anything on the Web, you work with what you got, and if you
can fix and share your fix, we all will love you for it. I seriously
don't think I understand what you're getting at here; it's been this
way since the Web popped into existance, and don't really want it to
be any other way.

 No it's not; if you design your system RESTfully (which, indeed, HTTP
 is) then the discovery part can be fast, cached, and using URI
 templates embedded in HTTP responses, fully flexible and fit for your
 purposes.

 These URIs are
 _external_ URIs from third parties, I have no control over whether they are
 designed RESTfully or not.

Not sure I follow this one. There are no good or bad RESTful URIs,
just URIs. REST is how your framework work with the URIs.

 In the meantime, I'll continue trying to balance functionality,
 maintainability, future expansion, and the programming and hardware
 resources available to me, same as I always do, here in the real world when
 we're building production apps, not RD experiments

My day job is to balance functionality, maintainability, future
expansion, and the programming and hardware resources available to me,
same as I always do, here in the real world when we're building
production apps ... and I'm using Topic Maps and SemWeb technologies.
Is there something I'm doing which degrades my work to an RD
experiment, something I should let my customers 

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
The difference between URIs and URLs?  I don't believe that URL is something 
that exists any more in any standard, it's all URIs. Correct me if I'm wrong. 

I don't entirely agree with either dogmatic side here, but I do think that 
we've arrived at an awfully confusing (for developers) environment. Re-reading 
the various semantic web TAG position papers people keep referencing, I 
actually don't entirely agree with all of their principles in practice. 

Jonatan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:27 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

Hiya,

Been meaning to jump into this discussion for a while, but I've been
off to an alternative universe and I can't even say it's good to be
back. :) Anwhoo ...

On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 You're right, if there were a web:  URI scheme, the world would be a
 better place.   But it's not, and the world is worse off for it.

I'm rather confused by this statement. The web: URI scheme? The Web
*is* the URI scheme; they are all identifiers to resources (ftp: http:
gopher: https: etc.), and together they make up, the, um, web of
things. What am I missing?

 Back in the old days, URIs (or URLs)  were protocol based.

No, which one do you mean, URIs or URLs?

 The ftp scheme
 was for retrieving documents via ftp. The telnet scheme was for telnet. And
 so on.

Again, have I missed something? This has changed, as opposed to the
good old days?

 A few years later the semantic web was conceived and alot of SW people began
 coining all manner of http URIs that had nothing to do with the http
 protocol.

I've been browsing back and forth this discussion, and couldn't find
much to back this up. What do you mean by this?

 Instead, they should have bit the bullet and coined a new scheme.  They
 didn't, and that's why we're in the mess we're in.

I'm sorry, but mess? Did you know the messiness of the web is
probably what made it successful? Not to mention that having URIs be
identifiers *and* have the ability to resolve them is a bonus; they're
identifiers of things (as they've always been, as I'm sure you know
URI stands for Unified Resource Identifier, right? :), as in they
consists of a string of characters used to identify or name a resource
on the Internet. And then, if you so choose, you can use the protocol
level to *resolve* them. Not sure how anyone can consider this to be
bad, though.

Or is this just a misunderstanding of the difference between URIs and URLs?


Kind regards,

Alexander
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu


The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.


The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Can you show me where this definition of a URL vs. a URI is made in any RFC 
or standard-like document?

Sure, we have a _sense_ of how the connotation is different, but I don't think 
that sense is actually formalized anywhere. And that's part of what makes it 
confusing, yeah.  I think the sem web crowd actually embraces this 
confusingness, they want to have it both ways: Oh, a URI doesn't need to 
resolve, it's just an opaque identifier; but you really should use http URIs 
for all URIs; why? because it's important that they resolve. 

In general, combining two functions in one mechanism is a dangerous and 
confusing thing to do in data design, in my opinion. By analogy, it's what gets 
a lot of MARC/AACR2 into trouble.  It's also often a very convenient thing to 
do, and convenience matters. Although ironically, my problem with some of those 
TAG documents is actually that they privilege pure theory over practical 
convenience. 

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to 
infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely impractical for 
me, as I discovered with the SuDoc experiment (which turned out to be a useful 
experiment at least in understanding my own requirements).  If I get a URI 
representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell 
from the URI alone that it IS a Sudoc, AND I need to be able to extract the 
actual SuDoc identifier from it.  That completely violates their Opacity 
requirement, but it's entirely infeasible to require me to make an individual 
HTTP request for every URI I find, to figure out what it IS.  Infeasible for 
performance and cost reasons, and infeasible because it requires a lot more 
development effort at BOTH ends -- it means that every single URI _would_ have 
to de-reference to an RDF representation capable of telling me it identifies a 
SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a 
URI is different than a URL and does not need to resolve, foll!
 owing the opacity recommendation/requirement would mean that resolution 
would be absolutely required in order for me to use it.   Meaning that someone 
minting the URI would have to provide that infrastructure, and I as a client 
would have to write code to use it.  

But I just want a darn SuDoc in a URI -- and there are advantages to putting a 
SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like 
RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 
'opacity' reccommendation. There are trade-offs.  I think a lot of that TAG 
stuff privileges the theoretically pure over the on the ground practicalities. 
They've got a great fantasy in their heads of what the semantic web _could_ be, 
and I agree it's theoretically sound and _could_ be; but you've got to make it 
convenient and cheap if you actually want it to happen for real, sometimes 
sacrificing theoretical purity.   And THAT'S one important lesson of the 
success of the WWW. 

Jonathan


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:48 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Thanks Ray. By that definition ALL http URIs are URLs, a priori.  I read 
Alexander as trying to make a different distinction.


Ray Denenberg, Library of Congress wrote:

From: Jonathan Rochkind rochk...@jhu.edu


  
The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.



The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
 a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html
 
 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'
 
 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.

Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.

Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual
SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Am I not an agent making use of a URI who is attempting to infer 
properties from it? Like that it represents a SuDoc, and in particular 
what that SuDoc is?


If this kind of talmudic parsing of the TAG reccommendations to figure 
out what they _really_ mean is neccesary, I stand by my statement that 
the environment those TAG documents are encouraging is a confusing one.


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 14, 2009 10:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT
attempt to infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely
impractical for me, as I discovered with the SuDoc experiment (which
turned out to be a useful experiment at least in understanding my own
requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
or an LCCN), I need to be able to tell from the URI alone that it IS a
Sudoc, AND I need to be able to extract the actual SuDoc identifier
from it.  That completely violates their Opacity requirement, but it's
entirely infeasible to require me to make an individual HTTP request
for every URI I find, to figure out what it IS.



Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.


Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual

SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.



Andy.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Joe Atzberger
The User Agent is understood to be a typical browser, or other piece of
software, like wget, curl, etc.  It's the thing implementing the client side
of the specs.  I don't think you are operating as a user agent here as
much as you are a server application.  That is, assuming I have any idea
what you're actually doing.

--Joe

On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote:

 Am I not an agent making use of a URI who is attempting to infer properties
 from it? Like that it represents a SuDoc, and in particular what that SuDoc
 is?

 If this kind of talmudic parsing of the TAG reccommendations to figure out
 what they _really_ mean is neccesary, I stand by my statement that the
 environment those TAG documents are encouraging is a confusing one.

 Jonathan


 Houghton,Andrew wrote:

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html

 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'

 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.



 Jonathan, you need to take URI opacity in context.  The document is
 correct
 in suggesting that user agents should not attempt to infer properties of
 the referenced resource.  The Architecture of the Web is also clear on
 this
 point and includes an example.  Just because a resource URI ends in .html
 does not mean that HTML will be the representation being returned.  The
 user agent is inferring a property by looking at the end of the URI to see
 if it ends in .html, e.g., that the Web Document will be returning HTML.
  If you really want to know for sure you need to dereference it with a HEAD
 request.

 Now having said that, URI opacity applies to user agents dealing with
 *any*
 URIs that they come across in the wild.  They should not try to infer any
 semantics from the URI itself.  However, this doesn't mean that the minter
 of a URI cannot create a policy decision for a group of URIs under their
 control that contain semantics.  In your example, you made a policy
 decision about the URIs you were minting for SUDOCs such that the actual
 SUDOC identifier would appear someplace in the URI.  This is perfectly
 fine and is the basis for REST URIs, but understand you created a specific
 policy statement for those URIs, and if a user agent is aware of your
 policy
 statements about the URIs you mint, then they can infer semantics from
 the URIs you minted.

 Does that break URI opacity from a user agents perspective?  No.  It just
 means that those user agents who know about your policy can infer
 semantics
 from your URIs and those that don't should not infer any semantics because
 they don't know what the policies are, e.g., you could be returning PDF
 representations when the URI ends in .html, if that was your policy, and
 the only way for a user agent to know that is to dereference the URI with
 either HEAD or GET when they don't know what the policies are.


 Andy.






Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can you show me where this definition of a URL vs. a URI is made in any 
 RFC or standard-like document?

From http://www.faqs.org/rfcs/rfc3986.html ;

1.1.3.  URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term Uniform Resource Locator (URL) refers to the subset of URIs
   that, in addition to identifying a resource, provide a means of
   locating the resource by describing its primary access mechanism
   (e.g., its network location).  The term Uniform Resource Name
   (URN) has been used historically to refer to both URIs under the
   urn scheme [RFC2141], which are required to remain globally unique
   and persistent even when the resource ceases to exist or becomes
   unavailable, and to any other URI with the properties of a name.

   An individual scheme does not have to be classified as being just one
   of name or locator.  Instances of URIs from any given scheme may
   have the characteristics of names or locators or both, often
   depending on the persistence and care in the assignment of
   identifiers by the naming authority, rather than on any quality of
   the scheme.  Future specifications and related documentation should
   use the general term URI rather than the more restrictive terms
   URL and URN [RFC3305].

As you can see, an URI is an identifier, and a URL is a locator
(mechanism for retrieval), and since a URL is a subset of an URI, you
_can_ resolve URIs as well.

 Sure, we have a _sense_ of how the connotation is different, but
 I don't think that sense is actually formalized anywhere.

It is, and the same stuff is documented in WikiPedia as well ;

   http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   http://en.wikipedia.org/wiki/Uniform_Resource_Locator

 I think the sem web crowd actually embraces this confusingness,

No, I think they take it at face value; they(the URIs)  are
identifiers for things, and can be used for just that purpose, but
they are also URLs which mean they resolve to something. What I think
you're coming at is that something thing it resolves too, as *that*
has no definition. But then, if you go from RDF to Topic Maps PSIs
(PSIs are URIs with an extended meaning), *that* thing it resolves to
indeed has a definition; it's the prose explaining what the identifier
identifies, and this is the most important difference between RDF and
Topic Maps (and a very subtle but important difference, too).

 they want to have it both ways: Oh, a URI doesn't need to resolve,
 it's just an opaque identifier; but you really should use http URIs
 for all URIs; why? because it's important that they resolve.

I smell straw-man. :) But yes, they do want both, as both is in fact a
friggin' smart thing to have. We all deal with identifiers all the
time, in internal as external applications, so why not use an
indetifier scheme that has the added bonus of adding a resolver
mechanism? If you want to be stupid and lock yourself in your limited
world, then using them as just identifiers is fine but perhaps a bit,
well, stupid. But if you want to be smart about it, realizing that
without ontological work there will *never* be proper interop, you use
those identifiers and let them resolve to something. And if you're
really smart, you let them resolve to either more RDF statements, or,
if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :).

 In general, combining two functions in one mechanism is a
 dangerous and confusing thing to do in data design, in my opinion.

Because ... ?

 By analogy, it's what gets a lot of MARC/AACR2 into trouble.

Hmm, and I thought it was crap design that did that, coupled with poor
metadata constraints and validation channels, untyped fields, poor
tooling, the lack of machine understandability, and the general
library idiom of not invented here. But correct me if I'm wrong. :)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

Umm, I'd be wary to take as canon a draft with editorial notes going
back 4 to 5 years that still aren't resolved. In other words, this
document isn't relevant to the real world. Yet.

 They suggest: URI opacity    'Agents making use of URIs SHOULD NOT attempt 
 to infer properties of the referenced resource.'

Well, as a RESTafarian I understand this argument quite well. It's
about not assuming too much from the internal structure of the URI.
Again, it's an identifier, not a scheme such as an URL where structure
is defined. Again, for URIs, don't assume structure because at this
point it isn't an URL.

 If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to
 be able to tell from the URI alone that it IS a Sudoc, AND I need to be able
 to extract the actual SuDoc identifier from it.  That completely violates 
 their
 Opacity requirement

I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd 

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-10 Thread Jonathan Rochkind
Well, the thing is, those sem web folks LIKE what has resulted. They think it's 
_good_ that http:// can be resolved with a certain protocol in some cases, but 
can be an arbitrary identifier untied to protocol in others. 

It definitely is convenient in some cases.  

I have mixed feelings, I don't think it's a disaster, but I'm not sure it's 
always a good idea. 

Jonathan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mike Taylor 
[m...@indexdata.com]
Sent: Thursday, April 02, 2009 2:33 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

An account that has a depressing ring of accuracy to it.

Ray Denenberg, Library of Congress writes:
  You're right, if there were a web:  URI scheme, the world would be a
  better place.   But it's not, and the world is worse off for it.
 
  It shouldn't surprise anyone that I am sympathetic to Karen's criticisms.
  Here is some of my historical perspective (which may well differ from
  others').
 
  Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme
  was for retrieving documents via ftp. The telnet scheme was for telnet. And
  so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back
  when we developed the z39.50 URI scheme, which was around 1995. Most of us
  were not wise to the ways of the web that long ago, but we were told, by
  those who were, that z39.50r: and z39.50s:  at the beginning of a URL
  are explicit indications that the URI is to be resolved by Z39.50.
 
  A few years later the semantic web was conceived and alot of SW people began
  coining all manner of http URIs that had nothing to do with the http
  protocol.   By the time the rest of the world noticed, there were so many
  that it was too late to turn back. So instead, history was altered.  The
  company line became we never told you that the URI scheme was tied to a
  protocol.
 
  Instead, they should have bit the bullet and coined a new scheme.  They
  didn't, and that's why we're in the mess we're in.
 
  --Ray
 
 
  - Original Message -
  From: Houghton,Andrew hough...@oclc.org
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, April 02, 2009 9:41 AM
  Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
  registering info: uris?)
 
 
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Karen Coyle
   Sent: Wednesday, April 01, 2009 2:26 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] resolution and identification (was Re:
   [CODE4LIB] registering info: uris?)
  
   This really puzzles me, because I thought http referred to a protocol:
   hypertext transfer protocol. And when you put http://; in front of
   something you are indicating that you are sending the following string
   along to be processed by that protocol. It implies a certain
   application
   over the web, just as mailto:; implies a particular application. Yes,
   http is the URI for the hypertext transfer protocol. That doesn't
   negate the fact that it indicates a protocol.
  
   RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
   protocol.  Just because it says http people make all kinds of
   assumptions about type of use, persistence, resolvability, etc.  As I
   indicated in a prior message, whoever registered the http URI scheme
   could have easily used the token web: instead of http:.  All the
   URI scheme in RFC 3986 does is indicate what the syntax of the rest
   of the URI will look like.  That's all.  You give an excellent
   example: mailto.  The mailto URI scheme does not imply a particular
   application.  It is a URI scheme with a specific syntax.  That URI
   is often resolved with the SMTP (mail) protocol.  Whoever registered
   the mailto URI scheme could have specified the token as smtp:
   instead of mailto:;.
  
   My reading of Cool URIs is
   that they use the protocol, not just the URI. If they weren't intended
   to take advantage of http then W3C would have used something else as a
   URI. Read through the Cool URIs document and it's not about
   identifiers,
   it's all about using the *protocol* in service of identifying. Why use
   http?
  
   I'm assuming here when you say My reading of Cool URIs... means reading
   the Cool URIs for the Semantic Web document and not the Cool URIs Don't
   Change document.  The Cool URIs for the Semantic Web document is about
   linked data.  Tim Burners-Lee's four linked data priciples state:
  
 1. Use URIs as names for things.
 2. Use HTTP URIs so that people can look up those names.
 3. When someone looks up a URI, provide useful information.
 4. Include links to other URIs. so that they can discover more things.
  
   (2) is an important aspect to linking.  The Web is a hypertext based
   system
   that uses HTTP URIs to identify resources.  If you want to link, then you

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress

No,  not identical URIs.

Let's say I've put a copy of the schema permanently at each of the following 
locations.

http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
http://www.acme.com//mods-3-3.xsd
http://www.takoma.org/standards/mods-3-3.xsd

Three locations, three URIs.

But the issue of redirect or even resolution is irrelevant in the use case 
I'm citing.   I'm talking about the use of an identifier within a protocol, 
for the sole purpose of identifying an object that the recipient of the URI 
already has - or if it doesn't have it it isn't going to retrieve it, it 
will just fail the request.   The purpose of the identifier is to enable the 
server to determine whether it has the schema that the client is looking 
for.  (And by the way that should answer Ed's question about a use case.)


So the server has some table of schemas, in that table is the row:

[mods schema]   [ URI identifying the mods schema]

It recieves the SRU request:
http://z3950.loc.gov:7090/voyager?
version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI 
identifying the mods schema


If the URI identifying the MODS schema in the request matches the URI in 
the table, then the server know what schema the client wants, and it 
proceeds.  If there are multiple identifiers then it has to have a row in 
its table for each.


Does that make sense?

--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 2:07 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




Ray, you are absolutely right.  These would be bad identifiers.  But
let's say they're all identical (which I think is what you're saying,
right?), then this just strengthens the case for indirection through a
service like purl.org.  Then it doesn't *matter* that all of these are
different locations, there is one URI that represent the concept of
what is being kept at these locations.  At the end of the redirect can
be some sort of 300 response that lets the client pick which endpoint
is right for them -or arbitrarily chooses one for them.

-Ross.

On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:

We do just fine minting our URIs at LC, Andy. But we do appreciate your
concern.

The analysis of our MODS URIs misses the point, I'm afraid. Let's forget
the set I cited (bad example) and assume that the schema is replicated at
several locations (geographically dispersed) all of which are planned to
house the specific version permanently. The suggestion to designate one 
as

cannonical is a good suggestion but it isn't always possible (for various
reasons, possibly political). So I maintain that in this scenario you 
have

several *location* none of which serves well as an identifier. I'm not
arguing (here) that info is better than http (for this scenario) just 
that

these are not good identifiers.

--Ray

- Original Message - From: Houghton,Andrew hough...@oclc.org
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 1:21 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 1:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

The general convention is that http://; is a web address, a location.
I
realize that it's also a form of URI, but that's a minority use of
http.
This leads to a great deal of confusion. I understand the desire to use
domain names as a way to create unique, managed identifiers, but the
http part is what is causing us problems.


http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss. However, it is not a location. URIs according
to RFC 3986 are just tokens to identify resources. These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator. Whoever initially registered the HTTP URI scheme could
have used web as the token instead and we would all be doing:
web://example.org/. This is the confusion. People don't understand
what RFC 3986 is saying. It makes no claim that any URI registered
scheme has persistence or can be dereferenced. An HTTP URI is just a
token to identify some resource, nothing more.


Andy.




Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ray Denenberg, Library of Congress
 Sent: Wednesday, April 01, 2009 1:59 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 We do just fine minting our URIs at LC, Andy. But we do appreciate your
 concern.

Sorry Ray, that statement wasn't directed at LC in particular, but was a 
general statement.  OCLC doesn’t do any better in this area, especially 
with WorldCat where there are the same issues I pointed out with your 
examples and additional issues to boot.  The point I was trying to make
was *all* organizations need to have clear policies on creating, 
maintaining, persistence, etc.  Failure to do so creates a big mess 
that takes time to fix, often creating headaches for those using an 
organizations URIs.  Take for example when NISO redesigned their site 
and broke all the URIs to their standards.  Tim Berners-Lee addresses 
this in his Cool URIs Don't Break article.

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Wednesday, April 01, 2009 2:07 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Ray, you are absolutely right.  These would be bad identifiers.  But
 let's say they're all identical (which I think is what you're saying,
 right?), then this just strengthens the case for indirection through a
 service like purl.org.  Then it doesn't *matter* that all of these are
 different locations, there is one URI that represent the concept of
 what is being kept at these locations.  At the end of the redirect can
 be some sort of 300 response that lets the client pick which endpoint
 is right for them -or arbitrarily chooses one for them.

Exactly, but purl.org is just using standard HTTP protocol mechanisms 
which could be easily done by LC's site given Ray's examples.

What is at issue is the identification of a Real World Object URI for
MODS v3.3.  Whether I get back an XML schema, a RelaxNG schema, etc.
are just Web Documents or representations of that abstract Real World 
Object.  What Ross did was make the PURL the Real World Object URI for
MODS v3.3 and used it to redirect to the geographically distributed
Web Documents, e.g., representations.  LC could have just as well
minted one under its own domain.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Wednesday, April 01, 2009 2:26 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 This really puzzles me, because I thought http referred to a protocol:
 hypertext transfer protocol. And when you put http://; in front of
 something you are indicating that you are sending the following string
 along to be processed by that protocol. It implies a certain
 application
 over the web, just as mailto:; implies a particular application. Yes,
 http is the URI for the hypertext transfer protocol. That doesn't
 negate the fact that it indicates a protocol. 

RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token web: instead of http:.  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as smtp:
instead of mailto:;.

 My reading of Cool URIs is
 that they use the protocol, not just the URI. If they weren't intended
 to take advantage of http then W3C would have used something else as a
 URI. Read through the Cool URIs document and it's not about
 identifiers,
 it's all about using the *protocol* in service of identifying. Why use
 http?

I'm assuming here when you say My reading of Cool URIs... means reading
the Cool URIs for the Semantic Web document and not the Cool URIs Don't
Change document.  The Cool URIs for the Semantic Web document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

   1. Use URIs as names for things.
   2. Use HTTP URIs so that people can look up those names.
   3. When someone looks up a URI, provide useful information.
   4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based system
that uses HTTP URIs to identify resources.  If you want to link, then you 
need to use HTTP URIs.  There is only one protocol, today, that accepts 
HTTP URIs as currency and its appropriately called HTTP and defined by 
RFC 2616.

The Cool URIs for the Semantic Web document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.
Its important to understand the URIs are just tokens that *can* be presented 
to a protocol for resolution.  Its up to the protocol to define the currency
that it will accept, e.g., HTTP URIs, and its up to an implementation of the
protocol to define the tokens of that currency that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but in
the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc. or it should
dereference the HTTP URI with the HTTP protocol.  In fact the HTTP protocol
doesn't directly accept HTTP URIs.  As part of the dereference process the
HTTP user agent needs to break apart the HTTP URI and present it to the HTTP
protocol.  For example the HTTP URI: http://example.org/ becomes the HTTP 
protocol request:

GET / HTTP/1.1
Host: example.org

Think of a URI as a minted token.  The New York subway mints tokens to ride 
the subway to get to a destination.  Placing a U.S. quarter or a Boston
subway token in a turn style will not allow you to pass.  You must use the 
New York subway minted token, e.g., currency.  URIs are the same.  OCLC 
can mint HTTP URI tokens and LC can mint HTTP URI tokens, both are using
the HTTP URI currency, but sending LC HTTP URI tokens, e.g., Boston subway
tokens, to OCLC's Web server will most likely result in a 404, you cannot
pass since OCLC's Web server only accepts OCLC tokens, e.g., New York subway
tokens, that identify a resource under its control.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Thursday, April 02, 2009 8:41 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 I have to say I am suspicious of schemes like PURL, which for all
 their good points introduce a single point of failure into, well,
 everything that uses them.  That can't be good.  Especially as it's
 run by the same compary that also runs the often-unavailable OpenURL
 registry.

What you are saying is that you are suspicious of the HTTP protocol.  All
the PURL server does is use mechanisms specified by the HTTP protocol.
Any HTTP server is capable of implementing those same mechanisms.  The
actual PURL server is a community based service that allows people to
create HTTP URIs that redirect to other URIs without having to run an 
actual HTTP server.  If you don't like its single point of failure, then 
create your own in-house service using your existing HTTP server.  I 
believe the source code for the entire PURL service is freely available 
and other people have taken the opportunity to run their own in-house or 
community based service.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
   I have to say I am suspicious of schemes like PURL, which for all
   their good points introduce a single point of failure into, well,
   everything that uses them.  That can't be good.  Especially as
   it's run by the same compary that also runs the often-unavailable
   OpenURL registry.
  
  What you are saying is that you are suspicious of the HTTP protocol.

That is NOT what I am saying.

I am saying I am suspicious of a single point of failure.  Especially
since the entire architecture of the Internet was (rightly IMHO)
designed with the goal of avoid SPOFs.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  In My Egotistical Opinion, most people's C programs should
 be indented six feet downward and covered with dirt -- Blair
 P. Houghton.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:

RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.




And RFC 2616 (Hypertext transfer protocol) says:

The HTTP protocol is a request/response protocol. A client sends a 
request to the server in the form of a request method, URI, and protocol 
version, followed by a MIME-like message containing request modifiers, 
client information, and possible body content over a connection with a 
server.


So what you are saying is that it's ok to use the URI for the hypertext 
transfer protocol in a way that ignores RFC 2616. I'm just not sure how 
functional that is, in the grand scheme of things. And when you say:



The Cool URIs for the Semantic Web document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.


I think you are deliberating distorting the intent of the Cool URIs 
document. You seem to read it that *given* an http uri, here is how the 
protocol should respond. But in fact the Cool URIs document asks the 
question So the question is, what URIs should we use in RDF? and 
responds that one should use http URIs for the reason that:


Given only a URI, machines and people should be able to retrieve a 
description about the resource identified by the URI from the Web. Such 
a look-up mechanism is important to establish shared understanding of 
what a URI identifies. Machines should get RDF data and humans should 
get a readable representation, such as HTML. The standard Web transfer 
protocol, HTTP, should be used.


So it doesn't just say how to respond to an http URI; it says to use 
http URIs *because* there is a useful possible response. That's a very 
different statement. It is signficant that (as Mike pointed out, perhaps 
inadvertently) no one is using mailto: or ftp: as identifiers. That's 
not a coincidence.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
  I have to say I am suspicious of schemes like PURL, which
  for all their good points introduce a single point of
  failure into, well, everything that uses them.  That can't
  be good.  Especially as it's run by the same compary that
  also runs the often-unavailable OpenURL registry.

 What you are saying is that you are suspicious of the HTTP
 protocol.
   
   That is NOT what I am saying.
   
   I am saying I am suspicious of a single point of failure.
   Especially since the entire architecture of the Internet was
   (rightly IMHO) designed with the goal of avoid SPOFs.
  
  OK, good, then if you are concerned about the PURL services SPOF,
  take the freely available PURL software and created a distributed
  PURL based system and put it up for the community.

Why would  I want to do this when I could just Not Use PURLs?

Anyway, we're way off the subject now -- I guess if we want to argue
about the utility of PURL we could get a room :-)


 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  The cladistic defintion of Aves is: an unimportant offshoot of
 the much cooler dinosaur family which somehow managed to survive
 the K/T boundry intact -- Eric Lurio.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:


OK, good, then if you are concerned about the PURL services SPOF, take 
the freely available PURL software and created a distributed PURL based 
system and put it up for the community.  I think several people have

looked at this, but I have not heard of any progress or implementations.


Andy.

  


The California Digital Library ran the PURL software for a while, using 
it to mint identifiers for digital documents. It was a while back, but 
someone there may remember how it went.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Thursday, April 02, 2009 10:15 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Houghton,Andrew wrote:
  RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
  protocol.  Just because it says http people make all kinds of
  assumptions about type of use, persistence, resolvability, etc.
 
 
 And RFC 2616 (Hypertext transfer protocol) says:
 
 The HTTP protocol is a request/response protocol. A client sends a
 request to the server in the form of a request method, URI, and
 protocol
 version, followed by a MIME-like message containing request modifiers,
 client information, and possible body content over a connection with a
 server.
 
 So what you are saying is that it's ok to use the URI for the hypertext
 transfer protocol in a way that ignores RFC 2616. I'm just not sure how
 functional that is, in the grand scheme of things.

You missed the whole point that URIs, specified by RFC 3986, are just tokens
that are divorced from protocols, like RFC 2616, but often work in conjunction
with them to retrieve a representation of the resource defined by the URI
scheme.  It is up to the protocol to decide which URI schemes that it will 
accept.  In the case of RFC 2616, there is a one-to-one relationship, today,
with the HTTP URI scheme.  RFC 2616 could also have said it would accept other 
URI schemes too or another protocol could be defined, tomorrow, that also 
accepts the HTTP URI scheme, causing the HTTP URI scheme to have a one-to-many 
relationship between its scheme and protocols that accept its scheme.

 And when you say:
 
  The Cool URIs for the Semantic Web document describes how an HTTP
 protocol
  implementation (of RFC 2616) should respond to a dereference of an
 HTTP URI.
 
 I think you are deliberating distorting the intent of the Cool URIs
 document. You seem to read it that *given* an http uri, here is how the
 protocol should respond. But in fact the Cool URIs document asks the
 question So the question is, what URIs should we use in RDF? and
 responds that one should use http URIs for the reason that:
 
 Given only a URI, machines and people should be able to retrieve a
 description about the resource identified by the URI from the Web. Such
 a look-up mechanism is important to establish shared understanding of
 what a URI identifies. Machines should get RDF data and humans should
 get a readable representation, such as HTML. The standard Web transfer
 protocol, HTTP, should be used.

The answer to the question posed in the document is based on Tim 
Burners-Lee four linked data principles where one of them states to 
use HTTP URIs.  Nobody, as far as I know, has created a hypertext 
based system based on the URN or info URI schemes.  The only 
hypertext based system available today is the Web which is based on 
the HTTP protocol that accepts HTTP URIs.  So you cannot effectively 
accomplish linked data on the Web without using HTTP URIs.

The document has an RDF / Semantic Web slant, but Tim Burners-Lee's 
four linked data principles say nothing about RDF or the Semantic Web.  
Those four principles might be more aptly named the four linked 
information principles for the Web.  Further, the document does go on 
to describe how an HTTP server (an implementation of RFC 2616) should 
respond to requests for Real World Object, Generic Documents and Web 
Documents which is based on the W3C TAG decisions for httpRange-14 and 
genericResources-53.

The scope of the document clearly says:

  This document is a practical guide for implementers of the RDF 
   specification... It explains two approaches for RDF data hosted 
   on HTTP servers...

Section 2.1 discusses HTTP and content negotiation for Generic Documents.

Section 4 discusses how the HTTP server should respond with diagrams and
actual HTTP status codes to let user agents know which URIs are Real
World Objects vs. Generic Document and Web Documents, per the W3 TAG
decisions on httpRange-14 and genericResources-53.

Section 6 directly address the question that this thread has been talking
about, namely using new URI schemes, like URN and info and why they are
not acceptable in the context of linked data.

And here is a quote which is what I have said over and over again about
URI being tokens and divorced from protocols:

  To be truly useful, a new scheme must be accompanied by a protocol 
   defining how to access more information about the identified resource.
   For example, the ftp:// URI scheme identifies resources (files on an 
   FTP server), and also comes with a protocol for accessing them (the 
   FTP protocol).

  Some of the new URI schemes provide no such protocol at all. Others 
   provide a Web Service that allows retrieval of descriptions using the 
   HTTP protocol. The identifier is passed to the service, which looks up

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Karen Coyle writes:
   OK, good, then if you are concerned about the PURL services SPOF,
   take the freely available PURL software and created a distributed
   PURL based system and put it up for the community.  I think
   several people have looked at this, but I have not heard of any
   progress or implementations.
  
  The California Digital Library ran the PURL software for a while,
  using it to mint identifiers for digital documents. It was a while
  back, but someone there may remember how it went.

Wait, what?  They _were_ running a PURL resolver, but now they're not?
What does the P in PURL stand for again?

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Wagner's music is nowhere near as bad as it sounds -- Mark
 Twain.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress
You're right, if there were a web:  URI scheme, the world would be a 
better place.   But it's not, and the world is worse off for it.


It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
Here is some of my historical perspective (which may well differ from 
others').


Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
was for retrieving documents via ftp. The telnet scheme was for telnet. And 
so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
when we developed the z39.50 URI scheme, which was around 1995. Most of us 
were not wise to the ways of the web that long ago, but we were told, by 
those who were, that z39.50r: and z39.50s:  at the beginning of a URL 
are explicit indications that the URI is to be resolved by Z39.50.


A few years later the semantic web was conceived and alot of SW people began 
coining all manner of http URIs that had nothing to do with the http 
protocol.   By the time the rest of the world noticed, there were so many 
that it was too late to turn back. So instead, history was altered.  The 
company line became we never told you that the URI scheme was tied to a 
protocol.


Instead, they should have bit the bullet and coined a new scheme.  They 
didn't, and that's why we're in the mess we're in.


--Ray


- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, April 02, 2009 9:41 AM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 2:26 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

This really puzzles me, because I thought http referred to a protocol:
hypertext transfer protocol. And when you put http://; in front of
something you are indicating that you are sending the following string
along to be processed by that protocol. It implies a certain
application
over the web, just as mailto:; implies a particular application. Yes,
http is the URI for the hypertext transfer protocol. That doesn't
negate the fact that it indicates a protocol.


RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token web: instead of http:.  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as smtp:
instead of mailto:;.


My reading of Cool URIs is
that they use the protocol, not just the URI. If they weren't intended
to take advantage of http then W3C would have used something else as a
URI. Read through the Cool URIs document and it's not about
identifiers,
it's all about using the *protocol* in service of identifying. Why use
http?


I'm assuming here when you say My reading of Cool URIs... means reading
the Cool URIs for the Semantic Web document and not the Cool URIs Don't
Change document.  The Cool URIs for the Semantic Web document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based 
system

that uses HTTP URIs to identify resources.  If you want to link, then you
need to use HTTP URIs.  There is only one protocol, today, that accepts
HTTP URIs as currency and its appropriately called HTTP and defined by
RFC 2616.

The Cool URIs for the Semantic Web document describes how an HTTP 
protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP 
URI.
Its important to understand the URIs are just tokens that *can* be 
presented
to a protocol for resolution.  Its up to the protocol to define the 
currency
that it will accept, e.g., HTTP URIs, and its up to an implementation of 
the

protocol to define the tokens of that currency that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but 
in

the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
An account that has a depressing ring of accuracy to it.

Ray Denenberg, Library of Congress writes:
  You're right, if there were a web:  URI scheme, the world would be a 
  better place.   But it's not, and the world is worse off for it.
  
  It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
  Here is some of my historical perspective (which may well differ from 
  others').
  
  Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
  was for retrieving documents via ftp. The telnet scheme was for telnet. And 
  so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
  when we developed the z39.50 URI scheme, which was around 1995. Most of us 
  were not wise to the ways of the web that long ago, but we were told, by 
  those who were, that z39.50r: and z39.50s:  at the beginning of a URL 
  are explicit indications that the URI is to be resolved by Z39.50.
  
  A few years later the semantic web was conceived and alot of SW people began 
  coining all manner of http URIs that had nothing to do with the http 
  protocol.   By the time the rest of the world noticed, there were so many 
  that it was too late to turn back. So instead, history was altered.  The 
  company line became we never told you that the URI scheme was tied to a 
  protocol.
  
  Instead, they should have bit the bullet and coined a new scheme.  They 
  didn't, and that's why we're in the mess we're in.
  
  --Ray
  
  
  - Original Message - 
  From: Houghton,Andrew hough...@oclc.org
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, April 02, 2009 9:41 AM
  Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
  registering info: uris?)
  
  
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Karen Coyle
   Sent: Wednesday, April 01, 2009 2:26 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] resolution and identification (was Re:
   [CODE4LIB] registering info: uris?)
  
   This really puzzles me, because I thought http referred to a protocol:
   hypertext transfer protocol. And when you put http://; in front of
   something you are indicating that you are sending the following string
   along to be processed by that protocol. It implies a certain
   application
   over the web, just as mailto:; implies a particular application. Yes,
   http is the URI for the hypertext transfer protocol. That doesn't
   negate the fact that it indicates a protocol.
  
   RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
   protocol.  Just because it says http people make all kinds of
   assumptions about type of use, persistence, resolvability, etc.  As I
   indicated in a prior message, whoever registered the http URI scheme
   could have easily used the token web: instead of http:.  All the
   URI scheme in RFC 3986 does is indicate what the syntax of the rest
   of the URI will look like.  That's all.  You give an excellent
   example: mailto.  The mailto URI scheme does not imply a particular
   application.  It is a URI scheme with a specific syntax.  That URI
   is often resolved with the SMTP (mail) protocol.  Whoever registered
   the mailto URI scheme could have specified the token as smtp:
   instead of mailto:;.
  
   My reading of Cool URIs is
   that they use the protocol, not just the URI. If they weren't intended
   to take advantage of http then W3C would have used something else as a
   URI. Read through the Cool URIs document and it's not about
   identifiers,
   it's all about using the *protocol* in service of identifying. Why use
   http?
  
   I'm assuming here when you say My reading of Cool URIs... means reading
   the Cool URIs for the Semantic Web document and not the Cool URIs Don't
   Change document.  The Cool URIs for the Semantic Web document is about
   linked data.  Tim Burners-Lee's four linked data priciples state:
  
 1. Use URIs as names for things.
 2. Use HTTP URIs so that people can look up those names.
 3. When someone looks up a URI, provide useful information.
 4. Include links to other URIs. so that they can discover more things.
  
   (2) is an important aspect to linking.  The Web is a hypertext based 
   system
   that uses HTTP URIs to identify resources.  If you want to link, then you
   need to use HTTP URIs.  There is only one protocol, today, that accepts
   HTTP URIs as currency and its appropriately called HTTP and defined by
   RFC 2616.
  
   The Cool URIs for the Semantic Web document describes how an HTTP 
   protocol
   implementation (of RFC 2616) should respond to a dereference of an HTTP 
   URI.
   Its important to understand the URIs are just tokens that *can* be 
   presented
   to a protocol for resolution.  Its up to the protocol to define the 
   currency
   that it will accept, e.g., HTTP URIs, and its up to an implementation of 
   the
   protocol to define the tokens of that currency that it will accept

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Erik Hetzner
Hi Ray -

At Thu, 2 Apr 2009 13:48:19 -0400,
Ray Denenberg, Library of Congress wrote:
 
 You're right, if there were a web:  URI scheme, the world would be a 
 better place.   But it's not, and the world is worse off for it.

Well, the original concept of the ‘web’ was, as I understand it, to
bring together all the existing protocols (gopher, ftp, etc.), with
the new one in addition (HTTP), with one unifying address scheme, so
that you could have this ‘web browser’ that you could use for
everything. So web: would have been nice, but probably wouldn’t have
been accepted.

As it turns out, HTTP won overwhelmingly, and the older protocols died
off.

 It shouldn't surprise anyone that I am sympathetic to Karen's
 criticisms. Here is some of my historical perspective (which may
 well differ from others').
 
 Back in the old days, URIs (or URLs) were protocol based. The ftp
 scheme was for retrieving documents via ftp. The telnet scheme was
 for telnet. And so on. Some of you may remember the ZIG (Z39.50
 Implementors Group) back when we developed the z39.50 URI scheme,
 which was around 1995. Most of us were not wise to the ways of the
 web that long ago, but we were told, by those who were, that
 z39.50r: and z39.50s: at the beginning of a URL are explicit
 indications that the URI is to be resolved by Z39.50.
 
 A few years later the semantic web was conceived and alot of SW
 people began coining all manner of http URIs that had nothing to do
 with the http protocol. By the time the rest of the world noticed,
 there were so many that it was too late to turn back. So instead,
 history was altered. The company line became we never told you that
 the URI scheme was tied to a protocol.
 
 Instead, they should have bit the bullet and coined a new scheme.  They 
 didn't, and that's why we're in the mess we're in.

Not knowing the details of the history, your account seems correct to
me, except that I don’t think the web people tried to alter history.

I think of the web of having been a learning experience for all of us.
Yes, we used to think that the URI was tied to the protocol. But we
have learned that it doesn’t need to be, that HTTP URIs can be just
identifiers which happen to be dereferencable at the moment using the
HTTP protocol.

And it became useful to begin identifying lots of things, people and
places and so on, using identifiers, and it also seemed useful to use
a protocol that existed (HTTP), instead of coming up with the
Person-Metadata Transfer Protocol and inventing a new URI scheme
(pmtp://...) to resolve metadata about persons. Because HTTP doesn’t
care what kind of data it is sending down the line; it can happily
send metadata about people.

But that is how things grow; the http:// at the beginning of a URI may
eventually be a spandrel, when HTTP is dead and buried. And people
will wonder why the address http://dx.doi.org/10./xxx has those
funny characters in front of it. And doi.org will be long gone,
because they ran out of money, and their domain was taken over by
squatters, so we all had to agree to alter our browsers to include an
override to not use DNS to resolve the dx.doi.org domain but instead
point to a new, distributed system of DOI resolution.

We will need to fix these problems as they arise.

In my opinion, if we are interested in identifier persistent, clarity
about the difference between things and information about things,
creating a more useful web (of data), and the other things we ought to
be interested in, our time is best spent worrying about these things,
and how they can be built on top of the web. Our time is not well
spent in coming up with new ways to do things that web already does
for us.

For instance: if there is concern that HTTP URIs are not seen as being
persistent, it would be useful to try to add a method to HTTP which
indicated the persistence of an identifier. This way browsers could
display a little icon that indicated that the URI was persistent. A
user could click on this icon and get information about the
institution which claimed persistence for the URI, what the level of
support was, what other institution could back up that claim, etc.

Our time would not be well spent coming up with an elaborate scheme
for phttp:// URIs, creating a better DNS, with name control by a
better institution, and a better HTTP, with metadata, and a better
caching system, and so on. This is a lot of work and you forget what
you were trying to do in the first place, which is make HTTP URIs
persistent.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpOEgu0KFRiA.pgp
Description: PGP signature


[CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Jonathan Rochkind

Houghton,Andrew wrote:

Lets separate your argument into two pieces. Identification and
resolution.  The DOI is the identifier and it inherently doesn't
tie itself to any resolution mechanism.  So creating an info URI
for it is meaningless, it's just another alias for the DOI.  I 
can create an HTTP resolution mechanism for DOI's by doing:


http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x

or

http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x

since the info URI contains the natural DOI identifier, wrapping it
in a URI scheme has no value when I could have used the DOI identifier
directly, as in the first HTTP resolution example.
  


I disagree that wrapping it in a URI scheme has no value.  We have very 
much software and schemas that are built to store URIs, even if they 
don't know what the URI is or what can be done with it, we have 
infrastructure in place for dealing with URIs.


So there is value in wrapping a 'natural' identifier in a URI, even if 
that URI does not carry it's own resolution mechanism with it. I have 
run into this in several places in my own work.


I share Mike's concerns about tying resolution to identification in one 
mechanism.  As a sort of general principle or 'pattern' or design, 
trying to make one mechanism do two jobs at once is a 'bad smell'.  It's 
in fact (I hope this isn't too far afield) how I'd sum up much of the 
failure of AACR2/MARC, involving our 'controlled headings' (see me 
expanding on this in some blog posts at 
http://bibwild.wordpress.com/2008/01/17/identifiers-and-display-labels-again/).



On the other hand, it is awfully _convenient_ to combine these two 
functions in one mechanism. And convenience does matter too.


I can see both sides. So I think we just do what feels right, and when 
we all disagree on what feels right, we pick one. I don't share the 
opinion of those who think it's obvious that everything should be an 
http uri, nor do I share the opinion of those who think it's obvious 
that this is a disaster.


DOI is definitely one good example of where One Canonical Resolution 
fails.  The DOI _resolution_ system fails for me -- it does not reliably 
or predictably deliver the right document for my users.  But a DOI as an 
identifier is still useful for me.  Even if that DOI were expressed in a 
URI as http://dx.doi.org/resolve/10./j.1475-4983.2007.00728.x, I 
STILL wouldn't actually use the HTTP server at dx.doi.org to resolve 
it.  I'd extract the actual DOI out of it, and use a different 
resolution mechanism.


Another example to think about is what happens when the protocol for 
resolution changes?  Right now already we could find a resolution 
service starting to make available and/or insist upon https protocol 
resolution.  But all those existing identifiers expressed as http URIs 
should not change, they are meant to be persistent. So already it's 
possible for an identifier originally intended to describe it's own 
resolution to be slightly wrong.  Is this confusing? In the future, 
maybe we'll have something different than http entirely.



Jonathan


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Jonathan Rochkind
I admit that httprange-14 still confuses me. (I have no idea why it's 
called httprange-14 for one thing).


But how do you identify the URI as being a Real World Object? I don't 
understand what it entails.


And http://doi.org/*;  describes it's own type only to software that 
knows what a URI beginning http://doi.org means, right? 

What about Eric Hellman's point that there are a variety of possible 
http URIs (not just possible but _in use_) that encapsulate a DOI, and 
given software would have to know all of the possible templates (with 
more being created all the time)?


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, April 01, 2009 11:08 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)

Houghton,Andrew wrote:


Lets separate your argument into two pieces. Identification and
resolution.  The DOI is the identifier and it inherently doesn't
tie itself to any resolution mechanism.  So creating an info URI
for it is meaningless, it's just another alias for the DOI.  I
can create an HTTP resolution mechanism for DOI's by doing:

http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x

or

http://resolve.example.org/?uri=info:doi/10./j.1475-
  

4983.2007.00728.x


since the info URI contains the natural DOI identifier, wrapping it
in a URI scheme has no value when I could have used the DOI
  

identifier


directly, as in the first HTTP resolution example.

  

I disagree that wrapping it in a URI scheme has no value.  We have very
much software and schemas that are built to store URIs, even if they
don't know what the URI is or what can be done with it, we have
infrastructure in place for dealing with URIs.



Oops... that should have read ... wrapping it in an unresolvable URI
scheme...

The point being that:

urn:doi:*
info:doi:*

provide no advantages over:

http://doi.org/*

when, per W3C TAG httpRange-14 decision you identify the URI as being a 
Real World Object.  When identifying the HTTP URI as a Real World Object,

it is the same as what Mike said about the info URI that: the identifier
describes its own type.


Andy.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
On Wed, Apr 1, 2009 at 11:37 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 I admit that httprange-14 still confuses me. (I have no idea why it's
 called httprange-14 for one thing).

http://www.w3.org/2001/tag/group/track/issues/14

Some background:
http://efoundations.typepad.com/efoundations/2009/02/httprange14-cool-uris-frbr.html

 And http://doi.org/*;  describes it's own type only to software that
 knows what a URI beginning http://doi.org means, right?

How is that different from the software knowing what info:doi/ means?
The difference is, how much more software knows what http: means vs.
info:?

And this, I think, has got to be point here.  How many times do we
need to marginalize ourselves with our ideals and expectations that
nobody else adheres to before we're rendered completely irrelevant?

Doesn't it make sense to coopt the mainstream processes and apply them
to our ideals?  What, exactly, is the resistance here?

 What about Eric Hellman's point that there are a variety of possible http
 URIs (not just possible but _in use_) that encapsulate a DOI, and given
 software would have to know all of the possible templates (with more being
 created all the time)?

Right, but here again is where we're talking about the difference
between a location and the identifier.

We're talking about establishing
http://dx.doi.org/10./j.1475-4983.2007.00728.x

(or something like that --
http://hdl.handle.net/10./j.1475-4983.2007.00728.x might be more
appropriate)

as the identifier for doi:10./j.1475-4983.2007.00728.x

That you can access it via
http://doi.wiley.com/10./j.1475-4983.2007.00728.x (or resolve it
there) doesn't mean that that's the identifier for it.

-Ross.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ray Denenberg, Library of Congress

From: Houghton,Andrew hough...@oclc.org


The point being that:

urn:doi:*
info:doi:*

provide no advantages over:

http://doi.org/*



I think they do.

I realize this is pretty much a dead-end debate as everyone has dug 
themselves into a position and nobody is going to change their mind. It is a 
philosophical debate and there isn't a right answer.  But in my opinion 


I won't use the doi example because it's overloaded.  Let's talk about the 
hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: 
http://sudoc.org/xyz   if the latter is not going to resolve.


Why? Because it drives me nuts to see http URIs everywhere that give all 
appearances of resolvability - browsers, editors, etc.  turn them into 
clickable links.   Now, if you are setting up a resolution service where you 
get the document that the sudoc identifies when you click on the URI, then 
http is appropriate.   The *actual document*. Not a description of it in 
lieu of the document.  And the so-called architectural justification that 
it's ok to return metadata instead of the resource (representation) -- I 
don't buy it.


--Ray 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:
 But shouldn't we be able to know the difference between an identifier and a
 locator? Isn't that the problem here? That you don't know which it is if it
 starts with http://.

But you do if it starts with http://dx.doi.org

I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Karen Coyle

Ross Singer wrote:

On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:
  

But shouldn't we be able to know the difference between an identifier and a
locator? Isn't that the problem here? That you don't know which it is if it
starts with http://.



But you do if it starts with http://dx.doi.org
  


No, *I* don't. And neither does my email program, since it displayed it 
as a URL (blue and underlined). That's inside knowledge, not part of the 
technology. Someone COULD create a web site at that address, and there's 
nothing in the URI itself to tell me if it's a URI or a URL.


The general convention is that http://; is a web address, a location. I 
realize that it's also a form of URI, but that's a minority use of http. 
This leads to a great deal of confusion. I understand the desire to use 
domain names as a way to create unique, managed identifiers, but the 
http part is what is causing us problems.


John Kunze's ARK system attempted to work around this by using http to 
retrieve information about the URI, so you're not just left guessing. 
It's not a question of resolution, but of giving you a short list of 
things that you can learn about a URI that begins with http. However, 
again, unless you know the secret you have no idea that those particular 
URI/Ls have that capability. So again we're going beyond the technology 
into some human knowledge that has to be there to take advantage of the 
capabilities. It doesn't seem so far fetched to make it possible for 
programs (dumb, dumb programs) to know the difference between an 
identifier and a location based on something universal, like a prefix, 
without having to be coded for dozens or hundreds of exceptions.


kc


I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross.


  



--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Wednesday, April 01, 2009 1:06 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 The general convention is that http://; is a web address, a location.
 I
 realize that it's also a form of URI, but that's a minority use of
 http.
 This leads to a great deal of confusion. I understand the desire to use
 domain names as a way to create unique, managed identifiers, but the
 http part is what is causing us problems.

http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss.  However, it is not a location.  URIs according
to RFC 3986 are just tokens to identify resources.  These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator.  Whoever initially registered the HTTP URI scheme could 
have used web as the token instead and we would all be doing:
web://example.org/.  This is the confusion.  People don't understand 
what RFC 3986 is saying.  It makes no claim that any URI registered 
scheme has persistence or can be dereferenced.  An HTTP URI is just a 
token to identify some resource, nothing more.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
My point is that I don't see how they're different in practice.

And one of them actually allowed you to do something from your email client.

-Ross.

On Wed, Apr 1, 2009 at 1:20 PM, Karen Coyle li...@kcoyle.net wrote:
 Ross, I don't get your point. My point was about the confusion between two
 things that begin: http:// but that are very different in practice. What's
 yours?

 kc

 Ross Singer wrote:

 Your email client knew what do with:

 info:doi/10./j.1475-4983.2007.00728.x ?

 doi:10./j.1475-4983.2007.00728.x ?

 Or did you recognize the info:doi scheme and Google it?

 Or would this, in case of 99% of the world, just look like gibberish
 or part of some nerd's PGP key?

 -Ross.

 On Wed, Apr 1, 2009 at 1:06 PM, Karen Coyle li...@kcoyle.net wrote:


 Ross Singer wrote:


 On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:



 But shouldn't we be able to know the difference between an identifier
 and
 a
 locator? Isn't that the problem here? That you don't know which it is
 if
 it
 starts with http://.



 But you do if it starts with http://dx.doi.org



 No, *I* don't. And neither does my email program, since it displayed it
 as a
 URL (blue and underlined). That's inside knowledge, not part of the
 technology. Someone COULD create a web site at that address, and there's
 nothing in the URI itself to tell me if it's a URI or a URL.

 The general convention is that http://; is a web address, a location. I
 realize that it's also a form of URI, but that's a minority use of http.
 This leads to a great deal of confusion. I understand the desire to use
 domain names as a way to create unique, managed identifiers, but the http
 part is what is causing us problems.

 John Kunze's ARK system attempted to work around this by using http to
 retrieve information about the URI, so you're not just left guessing.
 It's
 not a question of resolution, but of giving you a short list of things
 that
 you can learn about a URI that begins with http. However, again, unless
 you
 know the secret you have no idea that those particular URI/Ls have that
 capability. So again we're going beyond the technology into some human
 knowledge that has to be there to take advantage of the capabilities. It
 doesn't seem so far fetched to make it possible for programs (dumb, dumb
 programs) to know the difference between an identifier and a location
 based
 on something universal, like a prefix, without having to be coded for
 dozens
 or hundreds of exceptions.

 kc



 I still don't see the difference.  The same logic that would be
 required to parse and understand the info: uri scheme could be used to
 apply towards an http uri scheme.

 -Ross.





 --
 ---
 Karen Coyle / Digital Library Consultant
 kco...@kcoyle.net http://www.kcoyle.net
 ph.: 510-540-7596   skype: kcoylenet
 fx.: 510-848-3913
 mo.: 510-435-8234
 







 --
 ---
 Karen Coyle / Digital Library Consultant
 kco...@kcoyle.net http://www.kcoyle.net
 ph.: 510-540-7596   skype: kcoylenet
 fx.: 510-848-3913
 mo.: 510-435-8234