Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ross Singer
So hey, I'm nobody wanted to see this thread revived, but I'm hoping
you info uri folks can clear something up for me.

So I'm trying to gather together a vocabulary of identifiers to
unambiguously describe the format of the data you would be getting in
a Jangle feed or an UnAPI response (or any other variation on this
theme).  I have a MODS document and I want *you* to have it too!.

Jakob Voss made the (reasonable) suggestion that rather than create
yet another identifier or registry to describe these formats, instead
it would make sense to use the work that the SRU:

http://www.loc.gov/standards/sru/resources/schemas.html

or OpenURL:

http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

communities have already done.  Which makes a lot of sense.  It would
be nice to use the same identifier in Jangle, SRU and OpenURL to say
that this is a MARCXML or ONIX record.

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this?  How do we keep up?  Are they
reusable?  Which one should be used?  Doesn't this pretty horribly
undermine the purpose of using info URIs in the first place?

Is anybody else interested in working on a way to unambiguously say
here is a Dublin Core resource as XML, but it is not OAI DC or this
is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse
among all of our various ways of sharing data?

Thanks,
-Ross.


Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this?


None.  (Or, whatever rationale there was, historically, should no longer 
apply.)  These should be aligned.   Post this to the OpenURL list (and 
perhaps SRU as well).  I'm certainly willing to work to come up with a 
solution.


--Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-16 Thread Mike Taylor
Jonathan Rochkind writes:
  There are trade-offs.  I think a lot of that TAG stuff privileges
  the theoretically pure over the on the ground practicalities.
  They've got a great fantasy in their heads of what the semantic web
  _could_ be, and I agree it's theoretically sound and _could_ be;
  but you've got to make it convenient and cheap if you actually want
  it to happen for real, sometimes sacrificing theoretical purity.
  And THAT'S one important lesson of the success of the WWW.

Very true and very important.  I've seen this stated most succinctly
by Clay Shirky:

You cannot simultaneously have mass adoption and rigor.

I hope one day I can come up with eight words as pithy as that.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Good craftsmanship may not be art, but good art incorporates
 good craftsmanship -- Jane MacDonald.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-15 Thread Jonathan Rochkind

Alexander Johannesen wrote:


I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd rather point out that SuDoc isn't
web friendly in itself, and *that* more than anything stands in the
way of using them with the web.

It stands in the way of using them in the fully realized sem web vision.

It does NOT stand in the way of using them in many useful ways that I 
can and want to use them _right now_. Ways which having a URI to refer 
to them are MUCH helped by. Whether it can resolve or not (YOU just made 
the point that a URI doesn't actually need to resolve, right? I'm still 
confused by this having it both ways -- URIs don't need to resolve, but 
if you're URIs don't resolve than you're doing it wrong. Huh?), if you 
have a URI for a SuDoc you can use it in any infrastructure set up to 
accept, store, and relate URIs. Like an OpenURL rft_id, and, yeah, like 
RDF even.  You can make statements about a SuDoc if it has a URI, 
whether or not it resolves, whether or not SuDoc itself is 'web 
friendly'.  One step at a time.


This is my frustration with semantic web stuff, making it harder to do 
things that we _could_ do right here and now, because it violates a 
fantasy of an ideal infrastructure that we may never actually have.


There are business costs, as well as technical problems, to be solved to 
create that ideal fantasy infrastructure. The business costs are _real_



 Also, having a unified resolver for
SuDoc isn't hard, can be at a fixed URL, and use a parameter for
identifiers. You don't need to snoop the non-parameterized section of
an URI to get the ID's ;
  
Okay, Alex, why don't you set this up for us then? And commit to 
providing it persistently indefinitely? Because I don't have the 
resources to do that.  And for the use cases I am confronted with, I 
don't _need_ it, any old URI, even not resolvable, will do--yes, as long 
as I can recognize it as a SuDoc and extract the bare SuDoc out of it. 
Which you say I shouldn't be doing (while others say that's a 
mis-reading of those docs to think I shouldn't be doing it) -- but 
avoiding doing that would raise the costs of my software quite a bit, 
and make the feature infeasible in the first place. Business costs and 
resources _matter_.


I'm being a bit dis-ingenous here, because rsinger actually already 
_has_ set something like this up, using purl.org. Which isn't perfect, 
but it's there, so fine. I still don't even need it for what I'm doing.




No it's not; if you design your system RESTfully (which, indeed, HTTP
is) then the discovery part can be fast, cached, and using URI
templates embedded in HTTP responses, fully flexible and fit for your
purposes.
  


Feel free to contribute code to my open source project (Umlaut) to 
accomplish the things I need to do in an efficient manner while making 
an HTTP request for every single rft_id that comes in.  These URIs are 
_external_ URIs from third parties, I have no control over whether they 
are designed RESTfully or not.  But you contribute the code, and it's 
good code, I'll be happy to use it.


In the meantime, I'll continue trying to balance functionality, 
maintainability, future expansion, and the programming and hardware 
resources available to me, same as I always do, here in the real world 
when we're building production apps, not RD experiments, where we don't 
have complete control over the entire environment we operate in. You 
telling me that everything would work great _if only_ everyone in the 
whole world that I need to inter-operate with did things the way you say 
they should -- does absolutely nothing for me. 

And this, again, is my frustration with many of these semantic web 
arguments I'm hearing -- describing an ideal fantasy world that doesn't 
exist, but insisting we act as if it does, even if that means putting 
barriers in the way of actually getting things done.  I'd like to 
actually get things done while moving bit-by-bit toward the semantic web 
vision. I can't if the semantic web vision insists that everything must 
be perfect, and disallows alternate solutions, alternate trade-offs, and 
alternate compromises. I don't have time for that, I'm building actual 
production apps with limited resources.


Jonathan


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-15 Thread Alexander Johannesen
Hiya,

On Thu, Apr 16, 2009 at 01:10, Jonathan Rochkind rochk...@jhu.edu wrote:
 It stands in the way of using them in the fully realized sem web vision.

Ok, I'm puzzled. How? As the SemWeb vision is all about first-order
logic over triplets, and the triplets are defined as URIs, if you can
pop something into a URI you're good to go. So how is it that SuDoc
doesn't fit into this, as you *can* chuck it in a URI? I said it was
unfriendly to the Web, not impossible.

 It does NOT stand in the way of using them in many useful ways that I can
 and want to use them _right now_.

Ah, but then go fix it.

 Ways which having a URI to refer to them
 are MUCH helped by. Whether it can resolve or not (YOU just made the point
 that a URI doesn't actually need to resolve, right? I'm still confused by
 this having it both ways -- URIs don't need to resolve, but if you're URIs
 don't resolve than you're doing it wrong. Huh?)

C'mon, it ain't *that* hard. :) URIs as identifiers is fine, having
them resolve as well is great. What's so confusing about that?

 , if you have a URI for a
 SuDoc you can use it in any infrastructure set up to accept, store, and
 relate URIs. Like an OpenURL rft_id, and, yeah, like RDF even.  You can make
 statements about a SuDoc if it has a URI, whether or not it resolves,
 whether or not SuDoc itself is 'web friendly'.  One step at a time.

 This is my frustration with semantic web stuff, making it harder to do
 things that we _could_ do right here and now, because it violates a fantasy
 of an ideal infrastructure that we may never actually have.

Huh? The people who made SuDoc didn't make it web friendly, and thus
the SemWeb stuff is harder to do because it lives on the web? (And
chucking your meta data into HTML as MF or RDF snippets ain't that
hard, it just require a minimum of knowledge)

 There are business costs, as well as technical problems, to be solved to
 create that ideal fantasy infrastructure. The business costs are _real_

No more real than the cost currently in place. The thing is that a lot
of people see the traditional cost disappear with the advent of SemWeb
and the new costs heavily reduced.

  Also, having a unified resolver for
 SuDoc isn't hard, can be at a fixed URL, and use a parameter for
 identifiers. You don't need to snoop the non-parameterized section of
 an URI to get the ID's ;

 Okay, Alex, why don't you set this up for us then?

Why? I don't give a rats bottom about SuDoc, don't need it, think it's
poorly designed, and gives me nothing in life. Why should I bother?
(Unless I'm given money for it, then I'll start caring ... :)

 And commit to providing
 it persistently indefinitely? Because I don't have the resources to do that.

Who's behind SuDoc, and are they serious about their creation? That's
the people you should send your anger instead.

  And for the use cases I am confronted with, I don't _need_ it, any old URI,
 even not resolvable, will do--yes, as long as I can recognize it as a SuDoc
 and extract the bare SuDoc out of it.

So what's the problem with just making some stuff up? If you can do
your thing in a vacuum I don't fully understand your problem with the
SemWeb stuff? If you don't want it, don't use it.

 Which you say I shouldn't be doing
 (while others say that's a mis-reading of those docs to think I shouldn't be
 doing it)

No, I think this one is the subtle difference between a URL and a URI.

 but avoiding doing that would raise the costs of my software
 quite a bit, and make the feature infeasible in the first place. Business
 costs and resources _matter_.

As with anything on the Web, you work with what you got, and if you
can fix and share your fix, we all will love you for it. I seriously
don't think I understand what you're getting at here; it's been this
way since the Web popped into existance, and don't really want it to
be any other way.

 No it's not; if you design your system RESTfully (which, indeed, HTTP
 is) then the discovery part can be fast, cached, and using URI
 templates embedded in HTTP responses, fully flexible and fit for your
 purposes.

 These URIs are
 _external_ URIs from third parties, I have no control over whether they are
 designed RESTfully or not.

Not sure I follow this one. There are no good or bad RESTful URIs,
just URIs. REST is how your framework work with the URIs.

 In the meantime, I'll continue trying to balance functionality,
 maintainability, future expansion, and the programming and hardware
 resources available to me, same as I always do, here in the real world when
 we're building production apps, not RD experiments

My day job is to balance functionality, maintainability, future
expansion, and the programming and hardware resources available to me,
same as I always do, here in the real world when we're building
production apps ... and I'm using Topic Maps and SemWeb technologies.
Is there something I'm doing which degrades my work to an RD
experiment, something I should let my customers 

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
The difference between URIs and URLs?  I don't believe that URL is something 
that exists any more in any standard, it's all URIs. Correct me if I'm wrong. 

I don't entirely agree with either dogmatic side here, but I do think that 
we've arrived at an awfully confusing (for developers) environment. Re-reading 
the various semantic web TAG position papers people keep referencing, I 
actually don't entirely agree with all of their principles in practice. 

Jonatan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:27 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

Hiya,

Been meaning to jump into this discussion for a while, but I've been
off to an alternative universe and I can't even say it's good to be
back. :) Anwhoo ...

On Fri, Apr 3, 2009 at 03:48, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 You're right, if there were a web:  URI scheme, the world would be a
 better place.   But it's not, and the world is worse off for it.

I'm rather confused by this statement. The web: URI scheme? The Web
*is* the URI scheme; they are all identifiers to resources (ftp: http:
gopher: https: etc.), and together they make up, the, um, web of
things. What am I missing?

 Back in the old days, URIs (or URLs)  were protocol based.

No, which one do you mean, URIs or URLs?

 The ftp scheme
 was for retrieving documents via ftp. The telnet scheme was for telnet. And
 so on.

Again, have I missed something? This has changed, as opposed to the
good old days?

 A few years later the semantic web was conceived and alot of SW people began
 coining all manner of http URIs that had nothing to do with the http
 protocol.

I've been browsing back and forth this discussion, and couldn't find
much to back this up. What do you mean by this?

 Instead, they should have bit the bullet and coined a new scheme.  They
 didn't, and that's why we're in the mess we're in.

I'm sorry, but mess? Did you know the messiness of the web is
probably what made it successful? Not to mention that having URIs be
identifiers *and* have the ability to resolve them is a bonus; they're
identifiers of things (as they've always been, as I'm sure you know
URI stands for Unified Resource Identifier, right? :), as in they
consists of a string of characters used to identify or name a resource
on the Internet. And then, if you so choose, you can use the protocol
level to *resolve* them. Not sure how anyone can consider this to be
bad, though.

Or is this just a misunderstanding of the difference between URIs and URLs?


Kind regards,

Alexander
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu


The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.


The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Can you show me where this definition of a URL vs. a URI is made in any RFC 
or standard-like document?

Sure, we have a _sense_ of how the connotation is different, but I don't think 
that sense is actually formalized anywhere. And that's part of what makes it 
confusing, yeah.  I think the sem web crowd actually embraces this 
confusingness, they want to have it both ways: Oh, a URI doesn't need to 
resolve, it's just an opaque identifier; but you really should use http URIs 
for all URIs; why? because it's important that they resolve. 

In general, combining two functions in one mechanism is a dangerous and 
confusing thing to do in data design, in my opinion. By analogy, it's what gets 
a lot of MARC/AACR2 into trouble.  It's also often a very convenient thing to 
do, and convenience matters. Although ironically, my problem with some of those 
TAG documents is actually that they privilege pure theory over practical 
convenience. 

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT attempt to 
infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely impractical for 
me, as I discovered with the SuDoc experiment (which turned out to be a useful 
experiment at least in understanding my own requirements).  If I get a URI 
representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to be able to tell 
from the URI alone that it IS a Sudoc, AND I need to be able to extract the 
actual SuDoc identifier from it.  That completely violates their Opacity 
requirement, but it's entirely infeasible to require me to make an individual 
HTTP request for every URI I find, to figure out what it IS.  Infeasible for 
performance and cost reasons, and infeasible because it requires a lot more 
development effort at BOTH ends -- it means that every single URI _would_ have 
to de-reference to an RDF representation capable of telling me it identifies a 
SuDoc and what the acutal bare SuDoc is. Contrary to the protestations that a 
URI is different than a URL and does not need to resolve, foll!
 owing the opacity recommendation/requirement would mean that resolution 
would be absolutely required in order for me to use it.   Meaning that someone 
minting the URI would have to provide that infrastructure, and I as a client 
would have to write code to use it.  

But I just want a darn SuDoc in a URI -- and there are advantages to putting a 
SuDoc in a URI _precisely_ so it can be used in URI-using infrastructures like 
RDF, and these advantages hold _even if_ it's not resolvable and we ignore the 
'opacity' reccommendation. There are trade-offs.  I think a lot of that TAG 
stuff privileges the theoretically pure over the on the ground practicalities. 
They've got a great fantasy in their heads of what the semantic web _could_ be, 
and I agree it's theoretically sound and _could_ be; but you've got to make it 
convenient and cheap if you actually want it to happen for real, sometimes 
sacrificing theoretical purity.   And THAT'S one important lesson of the 
success of the WWW. 

Jonathan


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Tuesday, April 14, 2009 9:48 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

On Tue, Apr 14, 2009 at 23:34, Jonathan Rochkind rochk...@jhu.edu wrote:
 The difference between URIs and URLs?  I don't believe that URL is 
 something that exists any more in any standard, it's all URIs. Correct me if 
 I'm wrong.

Sure it exists: URLs are a subset of URIs. URLs are locators as
opposed to just identifiers (which is an important distinction, much
used in SemWeb lingo), where URLs are closer to the protocol like
things Ray describe (or so I think).

 I don't entirely agree with either dogmatic side here, but I do think that 
 we've arrived at an
 awfully confusing (for developers) environment.

But what about it is confusing (apart from us having this discussion
:) ? Is it that we have IDs that happens to *also* resolve? And why is
that confusing?

 Re-reading the various semantic web TAG position papers people keep
 referencing, I actually don't entirely agree with all of their principles in 
 practice.

Well, let me just say that there's more to SemWeb than what comes out of W3C. :)


Kind regards,

Alex
--
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Thanks Ray. By that definition ALL http URIs are URLs, a priori.  I read 
Alexander as trying to make a different distinction.


Ray Denenberg, Library of Congress wrote:

From: Jonathan Rochkind rochk...@jhu.edu


  
The difference between URIs and URLs?  I don't believe that URL is 
something that exists any more in any standard, it's all URIs.



The URL is alive and well.

The W3C definition, http://www.w3.org/TR/uri-clarification/
 a URL is a type of URI that identifies a resource via a representation of 
its primary access mechanism (e.g., its network location), rather than by 
some other attributes it may have. Thus as we noted, http: is a URI 
scheme. An http URI is a URL.


SRU, for example, considers it's request to be  URL.

I do think this conversation has played itself out.   --Ray

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html
 
 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'
 
 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.

Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.

Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual
SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Jonathan Rochkind
Am I not an agent making use of a URI who is attempting to infer 
properties from it? Like that it represents a SuDoc, and in particular 
what that SuDoc is?


If this kind of talmudic parsing of the TAG reccommendations to figure 
out what they _really_ mean is neccesary, I stand by my statement that 
the environment those TAG documents are encouraging is a confusing one.


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 14, 2009 10:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
17.html

They suggest: URI opacity'Agents making use of URIs SHOULD NOT
attempt to infer properties of the referenced resource.'

I understand why that makes sense in theory, but it's entirely
impractical for me, as I discovered with the SuDoc experiment (which
turned out to be a useful experiment at least in understanding my own
requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
or an LCCN), I need to be able to tell from the URI alone that it IS a
Sudoc, AND I need to be able to extract the actual SuDoc identifier
from it.  That completely violates their Opacity requirement, but it's
entirely infeasible to require me to make an individual HTTP request
for every URI I find, to figure out what it IS.



Jonathan, you need to take URI opacity in context.  The document is correct
in suggesting that user agents should not attempt to infer properties of
the referenced resource.  The Architecture of the Web is also clear on this
point and includes an example.  Just because a resource URI ends in .html
does not mean that HTML will be the representation being returned.  The
user agent is inferring a property by looking at the end of the URI to see
if it ends in .html, e.g., that the Web Document will be returning HTML.  If 
you really want to know for sure you need to dereference it with a HEAD 
request.


Now having said that, URI opacity applies to user agents dealing with *any*
URIs that they come across in the wild.  They should not try to infer any
semantics from the URI itself.  However, this doesn't mean that the minter
of a URI cannot create a policy decision for a group of URIs under their
control that contain semantics.  In your example, you made a policy 
decision about the URIs you were minting for SUDOCs such that the actual

SUDOC identifier would appear someplace in the URI.  This is perfectly
fine and is the basis for REST URIs, but understand you created a specific
policy statement for those URIs, and if a user agent is aware of your policy
statements about the URIs you mint, then they can infer semantics from
the URIs you minted.

Does that break URI opacity from a user agents perspective?  No.  It just
means that those user agents who know about your policy can infer semantics
from your URIs and those that don't should not infer any semantics because
they don't know what the policies are, e.g., you could be returning PDF
representations when the URI ends in .html, if that was your policy, and
the only way for a user agent to know that is to dereference the URI with 
either HEAD or GET when they don't know what the policies are.



Andy.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Joe Atzberger
The User Agent is understood to be a typical browser, or other piece of
software, like wget, curl, etc.  It's the thing implementing the client side
of the specs.  I don't think you are operating as a user agent here as
much as you are a server application.  That is, assuming I have any idea
what you're actually doing.

--Joe

On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote:

 Am I not an agent making use of a URI who is attempting to infer properties
 from it? Like that it represents a SuDoc, and in particular what that SuDoc
 is?

 If this kind of talmudic parsing of the TAG reccommendations to figure out
 what they _really_ mean is neccesary, I stand by my statement that the
 environment those TAG documents are encouraging is a confusing one.

 Jonathan


 Houghton,Andrew wrote:

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html

 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'

 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.



 Jonathan, you need to take URI opacity in context.  The document is
 correct
 in suggesting that user agents should not attempt to infer properties of
 the referenced resource.  The Architecture of the Web is also clear on
 this
 point and includes an example.  Just because a resource URI ends in .html
 does not mean that HTML will be the representation being returned.  The
 user agent is inferring a property by looking at the end of the URI to see
 if it ends in .html, e.g., that the Web Document will be returning HTML.
  If you really want to know for sure you need to dereference it with a HEAD
 request.

 Now having said that, URI opacity applies to user agents dealing with
 *any*
 URIs that they come across in the wild.  They should not try to infer any
 semantics from the URI itself.  However, this doesn't mean that the minter
 of a URI cannot create a policy decision for a group of URIs under their
 control that contain semantics.  In your example, you made a policy
 decision about the URIs you were minting for SUDOCs such that the actual
 SUDOC identifier would appear someplace in the URI.  This is perfectly
 fine and is the basis for REST URIs, but understand you created a specific
 policy statement for those URIs, and if a user agent is aware of your
 policy
 statements about the URIs you mint, then they can infer semantics from
 the URIs you minted.

 Does that break URI opacity from a user agents perspective?  No.  It just
 means that those user agents who know about your policy can infer
 semantics
 from your URIs and those that don't should not infer any semantics because
 they don't know what the policies are, e.g., you could be returning PDF
 representations when the URI ends in .html, if that was your policy, and
 the only way for a user agent to know that is to dereference the URI with
 either HEAD or GET when they don't know what the policies are.


 Andy.






Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Alexander Johannesen
On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can you show me where this definition of a URL vs. a URI is made in any 
 RFC or standard-like document?

From http://www.faqs.org/rfcs/rfc3986.html ;

1.1.3.  URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term Uniform Resource Locator (URL) refers to the subset of URIs
   that, in addition to identifying a resource, provide a means of
   locating the resource by describing its primary access mechanism
   (e.g., its network location).  The term Uniform Resource Name
   (URN) has been used historically to refer to both URIs under the
   urn scheme [RFC2141], which are required to remain globally unique
   and persistent even when the resource ceases to exist or becomes
   unavailable, and to any other URI with the properties of a name.

   An individual scheme does not have to be classified as being just one
   of name or locator.  Instances of URIs from any given scheme may
   have the characteristics of names or locators or both, often
   depending on the persistence and care in the assignment of
   identifiers by the naming authority, rather than on any quality of
   the scheme.  Future specifications and related documentation should
   use the general term URI rather than the more restrictive terms
   URL and URN [RFC3305].

As you can see, an URI is an identifier, and a URL is a locator
(mechanism for retrieval), and since a URL is a subset of an URI, you
_can_ resolve URIs as well.

 Sure, we have a _sense_ of how the connotation is different, but
 I don't think that sense is actually formalized anywhere.

It is, and the same stuff is documented in WikiPedia as well ;

   http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   http://en.wikipedia.org/wiki/Uniform_Resource_Locator

 I think the sem web crowd actually embraces this confusingness,

No, I think they take it at face value; they(the URIs)  are
identifiers for things, and can be used for just that purpose, but
they are also URLs which mean they resolve to something. What I think
you're coming at is that something thing it resolves too, as *that*
has no definition. But then, if you go from RDF to Topic Maps PSIs
(PSIs are URIs with an extended meaning), *that* thing it resolves to
indeed has a definition; it's the prose explaining what the identifier
identifies, and this is the most important difference between RDF and
Topic Maps (and a very subtle but important difference, too).

 they want to have it both ways: Oh, a URI doesn't need to resolve,
 it's just an opaque identifier; but you really should use http URIs
 for all URIs; why? because it's important that they resolve.

I smell straw-man. :) But yes, they do want both, as both is in fact a
friggin' smart thing to have. We all deal with identifiers all the
time, in internal as external applications, so why not use an
indetifier scheme that has the added bonus of adding a resolver
mechanism? If you want to be stupid and lock yourself in your limited
world, then using them as just identifiers is fine but perhaps a bit,
well, stupid. But if you want to be smart about it, realizing that
without ontological work there will *never* be proper interop, you use
those identifiers and let them resolve to something. And if you're
really smart, you let them resolve to either more RDF statements, or,
if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :).

 In general, combining two functions in one mechanism is a
 dangerous and confusing thing to do in data design, in my opinion.

Because ... ?

 By analogy, it's what gets a lot of MARC/AACR2 into trouble.

Hmm, and I thought it was crap design that did that, coupled with poor
metadata constraints and validation channels, untyped fields, poor
tooling, the lack of machine understandability, and the general
library idiom of not invented here. But correct me if I'm wrong. :)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

Umm, I'd be wary to take as canon a draft with editorial notes going
back 4 to 5 years that still aren't resolved. In other words, this
document isn't relevant to the real world. Yet.

 They suggest: URI opacity    'Agents making use of URIs SHOULD NOT attempt 
 to infer properties of the referenced resource.'

Well, as a RESTafarian I understand this argument quite well. It's
about not assuming too much from the internal structure of the URI.
Again, it's an identifier, not a scheme such as an URL where structure
is defined. Again, for URIs, don't assume structure because at this
point it isn't an URL.

 If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to
 be able to tell from the URI alone that it IS a Sudoc, AND I need to be able
 to extract the actual SuDoc identifier from it.  That completely violates 
 their
 Opacity requirement

I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd 

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-10 Thread Jonathan Rochkind
Well, the thing is, those sem web folks LIKE what has resulted. They think it's 
_good_ that http:// can be resolved with a certain protocol in some cases, but 
can be an arbitrary identifier untied to protocol in others. 

It definitely is convenient in some cases.  

I have mixed feelings, I don't think it's a disaster, but I'm not sure it's 
always a good idea. 

Jonathan

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mike Taylor 
[m...@indexdata.com]
Sent: Thursday, April 02, 2009 2:33 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)

An account that has a depressing ring of accuracy to it.

Ray Denenberg, Library of Congress writes:
  You're right, if there were a web:  URI scheme, the world would be a
  better place.   But it's not, and the world is worse off for it.
 
  It shouldn't surprise anyone that I am sympathetic to Karen's criticisms.
  Here is some of my historical perspective (which may well differ from
  others').
 
  Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme
  was for retrieving documents via ftp. The telnet scheme was for telnet. And
  so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back
  when we developed the z39.50 URI scheme, which was around 1995. Most of us
  were not wise to the ways of the web that long ago, but we were told, by
  those who were, that z39.50r: and z39.50s:  at the beginning of a URL
  are explicit indications that the URI is to be resolved by Z39.50.
 
  A few years later the semantic web was conceived and alot of SW people began
  coining all manner of http URIs that had nothing to do with the http
  protocol.   By the time the rest of the world noticed, there were so many
  that it was too late to turn back. So instead, history was altered.  The
  company line became we never told you that the URI scheme was tied to a
  protocol.
 
  Instead, they should have bit the bullet and coined a new scheme.  They
  didn't, and that's why we're in the mess we're in.
 
  --Ray
 
 
  - Original Message -
  From: Houghton,Andrew hough...@oclc.org
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, April 02, 2009 9:41 AM
  Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
  registering info: uris?)
 
 
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Karen Coyle
   Sent: Wednesday, April 01, 2009 2:26 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] resolution and identification (was Re:
   [CODE4LIB] registering info: uris?)
  
   This really puzzles me, because I thought http referred to a protocol:
   hypertext transfer protocol. And when you put http://; in front of
   something you are indicating that you are sending the following string
   along to be processed by that protocol. It implies a certain
   application
   over the web, just as mailto:; implies a particular application. Yes,
   http is the URI for the hypertext transfer protocol. That doesn't
   negate the fact that it indicates a protocol.
  
   RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
   protocol.  Just because it says http people make all kinds of
   assumptions about type of use, persistence, resolvability, etc.  As I
   indicated in a prior message, whoever registered the http URI scheme
   could have easily used the token web: instead of http:.  All the
   URI scheme in RFC 3986 does is indicate what the syntax of the rest
   of the URI will look like.  That's all.  You give an excellent
   example: mailto.  The mailto URI scheme does not imply a particular
   application.  It is a URI scheme with a specific syntax.  That URI
   is often resolved with the SMTP (mail) protocol.  Whoever registered
   the mailto URI scheme could have specified the token as smtp:
   instead of mailto:;.
  
   My reading of Cool URIs is
   that they use the protocol, not just the URI. If they weren't intended
   to take advantage of http then W3C would have used something else as a
   URI. Read through the Cool URIs document and it's not about
   identifiers,
   it's all about using the *protocol* in service of identifying. Why use
   http?
  
   I'm assuming here when you say My reading of Cool URIs... means reading
   the Cool URIs for the Semantic Web document and not the Cool URIs Don't
   Change document.  The Cool URIs for the Semantic Web document is about
   linked data.  Tim Burners-Lee's four linked data priciples state:
  
 1. Use URIs as names for things.
 2. Use HTTP URIs so that people can look up those names.
 3. When someone looks up a URI, provide useful information.
 4. Include links to other URIs. so that they can discover more things.
  
   (2) is an important aspect to linking.  The Web is a hypertext based
   system
   that uses HTTP URIs to identify resources.  If you want to link, then you

Re: [CODE4LIB] registering info: uris?

2009-04-07 Thread Eric Hellman
no, that's not at all what it implies. the ofi/name identifiers were  
minted as identifiers for namespaces of indentifiers, not as a wrapper  
scheme for the identifiers themselves. Yes, it's a bit TOO meta, but  
they can be safely ignored unless a new profile is desired.



On Apr 5, 2009, at 10:31 AM, Karen Coyle wrote:


Jonathan Rochkind wrote:


URI for an ISBN or SuDocs?  I don't think the GPO is going  
anywhere, but the GPO isn't committing to supporting an http URI  
scheme, and whoever is, who knows if they're going anywhere. That  
issue is certainly mitigated by Ross using purl.org for these,  
instead of his own personal http URI. But another issue that makes  
us want a controlling authority is increasing the chances that  
everyone will use the _same_ URI.  If GPO were behind the purl.org/ 
NET/sudoc URIs, those chances would be high. Just Ross on his own,  
the chances go down, later someone else (OCLC, GPO, some other guy  
like Ross) might accidentally create a 'competitor', which would be  
unfortunate. Note this isn't as much of a problem for born web  
resources -- nobody's going to accidentally create an alternate URI  
for a dbpedia term, because anybody that knows about dbpedia knows  
that it lives at dbpedia.


So those are my thoughts. Now everyone else can argue bitterly over  
them for a while. :)


The ones that really puzzle me, however, are the OpenURL info  
namespace URIs for ftp, http, https and info. This implies that  
EVERY identifier used by OpenURL needs an info URI, even if it is a  
URI in its own right. They are under info:ofi/nam which is called  
Namespace reserved for registry identifiers of namespaces. There's  
something so circular about this that I just get a brain dump when I  
try to understand it. Does it make sense to anyone?


kc


--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Eric Hellman
http://hellman.net/eric/


Re: [CODE4LIB] registering info: uris?

2009-04-06 Thread Jonathan Rochkind

Karen Coyle wrote:


The ones that really puzzle me, however, are the OpenURL info namespace 
URIs for ftp, http, https and info. This implies that EVERY 
identifier used by OpenURL needs an info URI, even if it is a URI in its 
own right. They are under info:ofi/nam which is called Namespace 
reserved for registry identifiers of namespaces. There's something so 
circular about this that I just get a brain dump when I try to 
understand it. Does it make sense to anyone?
  

No, it does not make sense to anyone, as far as I can tell.

Jonathan





kc


  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress

No,  not identical URIs.

Let's say I've put a copy of the schema permanently at each of the following 
locations.

http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
http://www.acme.com//mods-3-3.xsd
http://www.takoma.org/standards/mods-3-3.xsd

Three locations, three URIs.

But the issue of redirect or even resolution is irrelevant in the use case 
I'm citing.   I'm talking about the use of an identifier within a protocol, 
for the sole purpose of identifying an object that the recipient of the URI 
already has - or if it doesn't have it it isn't going to retrieve it, it 
will just fail the request.   The purpose of the identifier is to enable the 
server to determine whether it has the schema that the client is looking 
for.  (And by the way that should answer Ed's question about a use case.)


So the server has some table of schemas, in that table is the row:

[mods schema]   [ URI identifying the mods schema]

It recieves the SRU request:
http://z3950.loc.gov:7090/voyager?
version=1.1operation=searchRetrievequery=dinosaurmaximumRecords=1recordSchema=URI 
identifying the mods schema


If the URI identifying the MODS schema in the request matches the URI in 
the table, then the server know what schema the client wants, and it 
proceeds.  If there are multiple identifiers then it has to have a row in 
its table for each.


Does that make sense?

--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 2:07 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




Ray, you are absolutely right.  These would be bad identifiers.  But
let's say they're all identical (which I think is what you're saying,
right?), then this just strengthens the case for indirection through a
service like purl.org.  Then it doesn't *matter* that all of these are
different locations, there is one URI that represent the concept of
what is being kept at these locations.  At the end of the redirect can
be some sort of 300 response that lets the client pick which endpoint
is right for them -or arbitrarily chooses one for them.

-Ross.

On Wed, Apr 1, 2009 at 1:59 PM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:

We do just fine minting our URIs at LC, Andy. But we do appreciate your
concern.

The analysis of our MODS URIs misses the point, I'm afraid. Let's forget
the set I cited (bad example) and assume that the schema is replicated at
several locations (geographically dispersed) all of which are planned to
house the specific version permanently. The suggestion to designate one 
as

cannonical is a good suggestion but it isn't always possible (for various
reasons, possibly political). So I maintain that in this scenario you 
have

several *location* none of which serves well as an identifier. I'm not
arguing (here) that info is better than http (for this scenario) just 
that

these are not good identifiers.

--Ray

- Original Message - From: Houghton,Andrew hough...@oclc.org
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, April 01, 2009 1:21 PM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 1:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

The general convention is that http://; is a web address, a location.
I
realize that it's also a form of URI, but that's a minority use of
http.
This leads to a great deal of confusion. I understand the desire to use
domain names as a way to create unique, managed identifiers, but the
http part is what is causing us problems.


http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss. However, it is not a location. URIs according
to RFC 3986 are just tokens to identify resources. These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator. Whoever initially registered the HTTP URI scheme could
have used web as the token instead and we would all be doing:
web://example.org/. This is the confusion. People don't understand
what RFC 3986 is saying. It makes no claim that any URI registered
scheme has persistence or can be dereferenced. An HTTP URI is just a
token to identify some resource, nothing more.


Andy.




Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ray Denenberg, Library of Congress
 Sent: Wednesday, April 01, 2009 1:59 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 We do just fine minting our URIs at LC, Andy. But we do appreciate your
 concern.

Sorry Ray, that statement wasn't directed at LC in particular, but was a 
general statement.  OCLC doesn’t do any better in this area, especially 
with WorldCat where there are the same issues I pointed out with your 
examples and additional issues to boot.  The point I was trying to make
was *all* organizations need to have clear policies on creating, 
maintaining, persistence, etc.  Failure to do so creates a big mess 
that takes time to fix, often creating headaches for those using an 
organizations URIs.  Take for example when NISO redesigned their site 
and broke all the URIs to their standards.  Tim Berners-Lee addresses 
this in his Cool URIs Don't Break article.

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Wednesday, April 01, 2009 2:07 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Ray, you are absolutely right.  These would be bad identifiers.  But
 let's say they're all identical (which I think is what you're saying,
 right?), then this just strengthens the case for indirection through a
 service like purl.org.  Then it doesn't *matter* that all of these are
 different locations, there is one URI that represent the concept of
 what is being kept at these locations.  At the end of the redirect can
 be some sort of 300 response that lets the client pick which endpoint
 is right for them -or arbitrarily chooses one for them.

Exactly, but purl.org is just using standard HTTP protocol mechanisms 
which could be easily done by LC's site given Ray's examples.

What is at issue is the identification of a Real World Object URI for
MODS v3.3.  Whether I get back an XML schema, a RelaxNG schema, etc.
are just Web Documents or representations of that abstract Real World 
Object.  What Ross did was make the PURL the Real World Object URI for
MODS v3.3 and used it to redirect to the geographically distributed
Web Documents, e.g., representations.  LC could have just as well
minted one under its own domain.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Wednesday, April 01, 2009 2:26 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 This really puzzles me, because I thought http referred to a protocol:
 hypertext transfer protocol. And when you put http://; in front of
 something you are indicating that you are sending the following string
 along to be processed by that protocol. It implies a certain
 application
 over the web, just as mailto:; implies a particular application. Yes,
 http is the URI for the hypertext transfer protocol. That doesn't
 negate the fact that it indicates a protocol. 

RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token web: instead of http:.  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as smtp:
instead of mailto:;.

 My reading of Cool URIs is
 that they use the protocol, not just the URI. If they weren't intended
 to take advantage of http then W3C would have used something else as a
 URI. Read through the Cool URIs document and it's not about
 identifiers,
 it's all about using the *protocol* in service of identifying. Why use
 http?

I'm assuming here when you say My reading of Cool URIs... means reading
the Cool URIs for the Semantic Web document and not the Cool URIs Don't
Change document.  The Cool URIs for the Semantic Web document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

   1. Use URIs as names for things.
   2. Use HTTP URIs so that people can look up those names.
   3. When someone looks up a URI, provide useful information.
   4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based system
that uses HTTP URIs to identify resources.  If you want to link, then you 
need to use HTTP URIs.  There is only one protocol, today, that accepts 
HTTP URIs as currency and its appropriately called HTTP and defined by 
RFC 2616.

The Cool URIs for the Semantic Web document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.
Its important to understand the URIs are just tokens that *can* be presented 
to a protocol for resolution.  Its up to the protocol to define the currency
that it will accept, e.g., HTTP URIs, and its up to an implementation of the
protocol to define the tokens of that currency that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but in
the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc. or it should
dereference the HTTP URI with the HTTP protocol.  In fact the HTTP protocol
doesn't directly accept HTTP URIs.  As part of the dereference process the
HTTP user agent needs to break apart the HTTP URI and present it to the HTTP
protocol.  For example the HTTP URI: http://example.org/ becomes the HTTP 
protocol request:

GET / HTTP/1.1
Host: example.org

Think of a URI as a minted token.  The New York subway mints tokens to ride 
the subway to get to a destination.  Placing a U.S. quarter or a Boston
subway token in a turn style will not allow you to pass.  You must use the 
New York subway minted token, e.g., currency.  URIs are the same.  OCLC 
can mint HTTP URI tokens and LC can mint HTTP URI tokens, both are using
the HTTP URI currency, but sending LC HTTP URI tokens, e.g., Boston subway
tokens, to OCLC's Web server will most likely result in a 404, you cannot
pass since OCLC's Web server only accepts OCLC tokens, e.g., New York subway
tokens, that identify a resource under its control.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Thursday, April 02, 2009 8:41 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 I have to say I am suspicious of schemes like PURL, which for all
 their good points introduce a single point of failure into, well,
 everything that uses them.  That can't be good.  Especially as it's
 run by the same compary that also runs the often-unavailable OpenURL
 registry.

What you are saying is that you are suspicious of the HTTP protocol.  All
the PURL server does is use mechanisms specified by the HTTP protocol.
Any HTTP server is capable of implementing those same mechanisms.  The
actual PURL server is a community based service that allows people to
create HTTP URIs that redirect to other URIs without having to run an 
actual HTTP server.  If you don't like its single point of failure, then 
create your own in-house service using your existing HTTP server.  I 
believe the source code for the entire PURL service is freely available 
and other people have taken the opportunity to run their own in-house or 
community based service.


Andy.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
   I have to say I am suspicious of schemes like PURL, which for all
   their good points introduce a single point of failure into, well,
   everything that uses them.  That can't be good.  Especially as
   it's run by the same compary that also runs the often-unavailable
   OpenURL registry.
  
  What you are saying is that you are suspicious of the HTTP protocol.

That is NOT what I am saying.

I am saying I am suspicious of a single point of failure.  Especially
since the entire architecture of the Internet was (rightly IMHO)
designed with the goal of avoid SPOFs.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  In My Egotistical Opinion, most people's C programs should
 be indented six feet downward and covered with dirt -- Blair
 P. Houghton.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:

RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of 
assumptions about type of use, persistence, resolvability, etc.




And RFC 2616 (Hypertext transfer protocol) says:

The HTTP protocol is a request/response protocol. A client sends a 
request to the server in the form of a request method, URI, and protocol 
version, followed by a MIME-like message containing request modifiers, 
client information, and possible body content over a connection with a 
server.


So what you are saying is that it's ok to use the URI for the hypertext 
transfer protocol in a way that ignores RFC 2616. I'm just not sure how 
functional that is, in the grand scheme of things. And when you say:



The Cool URIs for the Semantic Web document describes how an HTTP protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP URI.


I think you are deliberating distorting the intent of the Cool URIs 
document. You seem to read it that *given* an http uri, here is how the 
protocol should respond. But in fact the Cool URIs document asks the 
question So the question is, what URIs should we use in RDF? and 
responds that one should use http URIs for the reason that:


Given only a URI, machines and people should be able to retrieve a 
description about the resource identified by the URI from the Web. Such 
a look-up mechanism is important to establish shared understanding of 
what a URI identifies. Machines should get RDF data and humans should 
get a readable representation, such as HTML. The standard Web transfer 
protocol, HTTP, should be used.


So it doesn't just say how to respond to an http URI; it says to use 
http URIs *because* there is a useful possible response. That's a very 
different statement. It is signficant that (as Mike pointed out, perhaps 
inadvertently) no one is using mailto: or ftp: as identifiers. That's 
not a coincidence.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Houghton,Andrew writes:
  I have to say I am suspicious of schemes like PURL, which
  for all their good points introduce a single point of
  failure into, well, everything that uses them.  That can't
  be good.  Especially as it's run by the same compary that
  also runs the often-unavailable OpenURL registry.

 What you are saying is that you are suspicious of the HTTP
 protocol.
   
   That is NOT what I am saying.
   
   I am saying I am suspicious of a single point of failure.
   Especially since the entire architecture of the Internet was
   (rightly IMHO) designed with the goal of avoid SPOFs.
  
  OK, good, then if you are concerned about the PURL services SPOF,
  take the freely available PURL software and created a distributed
  PURL based system and put it up for the community.

Why would  I want to do this when I could just Not Use PURLs?

Anyway, we're way off the subject now -- I guess if we want to argue
about the utility of PURL we could get a room :-)


 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  The cladistic defintion of Aves is: an unimportant offshoot of
 the much cooler dinosaur family which somehow managed to survive
 the K/T boundry intact -- Eric Lurio.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Karen Coyle

Houghton,Andrew wrote:


OK, good, then if you are concerned about the PURL services SPOF, take 
the freely available PURL software and created a distributed PURL based 
system and put it up for the community.  I think several people have

looked at this, but I have not heard of any progress or implementations.


Andy.

  


The California Digital Library ran the PURL software for a while, using 
it to mint identifiers for digital documents. It was a while back, but 
someone there may remember how it went.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Thursday, April 02, 2009 10:15 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 Houghton,Andrew wrote:
  RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
  protocol.  Just because it says http people make all kinds of
  assumptions about type of use, persistence, resolvability, etc.
 
 
 And RFC 2616 (Hypertext transfer protocol) says:
 
 The HTTP protocol is a request/response protocol. A client sends a
 request to the server in the form of a request method, URI, and
 protocol
 version, followed by a MIME-like message containing request modifiers,
 client information, and possible body content over a connection with a
 server.
 
 So what you are saying is that it's ok to use the URI for the hypertext
 transfer protocol in a way that ignores RFC 2616. I'm just not sure how
 functional that is, in the grand scheme of things.

You missed the whole point that URIs, specified by RFC 3986, are just tokens
that are divorced from protocols, like RFC 2616, but often work in conjunction
with them to retrieve a representation of the resource defined by the URI
scheme.  It is up to the protocol to decide which URI schemes that it will 
accept.  In the case of RFC 2616, there is a one-to-one relationship, today,
with the HTTP URI scheme.  RFC 2616 could also have said it would accept other 
URI schemes too or another protocol could be defined, tomorrow, that also 
accepts the HTTP URI scheme, causing the HTTP URI scheme to have a one-to-many 
relationship between its scheme and protocols that accept its scheme.

 And when you say:
 
  The Cool URIs for the Semantic Web document describes how an HTTP
 protocol
  implementation (of RFC 2616) should respond to a dereference of an
 HTTP URI.
 
 I think you are deliberating distorting the intent of the Cool URIs
 document. You seem to read it that *given* an http uri, here is how the
 protocol should respond. But in fact the Cool URIs document asks the
 question So the question is, what URIs should we use in RDF? and
 responds that one should use http URIs for the reason that:
 
 Given only a URI, machines and people should be able to retrieve a
 description about the resource identified by the URI from the Web. Such
 a look-up mechanism is important to establish shared understanding of
 what a URI identifies. Machines should get RDF data and humans should
 get a readable representation, such as HTML. The standard Web transfer
 protocol, HTTP, should be used.

The answer to the question posed in the document is based on Tim 
Burners-Lee four linked data principles where one of them states to 
use HTTP URIs.  Nobody, as far as I know, has created a hypertext 
based system based on the URN or info URI schemes.  The only 
hypertext based system available today is the Web which is based on 
the HTTP protocol that accepts HTTP URIs.  So you cannot effectively 
accomplish linked data on the Web without using HTTP URIs.

The document has an RDF / Semantic Web slant, but Tim Burners-Lee's 
four linked data principles say nothing about RDF or the Semantic Web.  
Those four principles might be more aptly named the four linked 
information principles for the Web.  Further, the document does go on 
to describe how an HTTP server (an implementation of RFC 2616) should 
respond to requests for Real World Object, Generic Documents and Web 
Documents which is based on the W3C TAG decisions for httpRange-14 and 
genericResources-53.

The scope of the document clearly says:

  This document is a practical guide for implementers of the RDF 
   specification... It explains two approaches for RDF data hosted 
   on HTTP servers...

Section 2.1 discusses HTTP and content negotiation for Generic Documents.

Section 4 discusses how the HTTP server should respond with diagrams and
actual HTTP status codes to let user agents know which URIs are Real
World Objects vs. Generic Document and Web Documents, per the W3 TAG
decisions on httpRange-14 and genericResources-53.

Section 6 directly address the question that this thread has been talking
about, namely using new URI schemes, like URN and info and why they are
not acceptable in the context of linked data.

And here is a quote which is what I have said over and over again about
URI being tokens and divorced from protocols:

  To be truly useful, a new scheme must be accompanied by a protocol 
   defining how to access more information about the identified resource.
   For example, the ftp:// URI scheme identifies resources (files on an 
   FTP server), and also comes with a protocol for accessing them (the 
   FTP protocol).

  Some of the new URI schemes provide no such protocol at all. Others 
   provide a Web Service that allows retrieval of descriptions using the 
   HTTP protocol. The identifier is passed to the service, which looks up

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
Karen Coyle writes:
   OK, good, then if you are concerned about the PURL services SPOF,
   take the freely available PURL software and created a distributed
   PURL based system and put it up for the community.  I think
   several people have looked at this, but I have not heard of any
   progress or implementations.
  
  The California Digital Library ran the PURL software for a while,
  using it to mint identifiers for digital documents. It was a while
  back, but someone there may remember how it went.

Wait, what?  They _were_ running a PURL resolver, but now they're not?
What does the P in PURL stand for again?

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Wagner's music is nowhere near as bad as it sounds -- Mark
 Twain.


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 13:47:50 +0100,
Mike Taylor wrote:
 
 Erik Hetzner writes:
   Without external knowledge that info:doi/10./xxx is a URI, I can
   only guess.
 
 Yes, that is true.  The point is that by specifying that the rft_id
 has to be a URI, you can then use other kinds of URI without needing
 to broaden the specification.  So:
   info:doi/10./j.1475-4983.2007.00728.x
   urn:isbn:1234567890
   ftp://ftp.indexdata.com/pub/yaz
 
 [Yes, I am throwing in an ftp: URL as an identifier just because I can
 -- please let's not get sidetracked by this very bad idea :-) ]

 This is not just hypothetical: the flexibility is useful and the
 ecapsulation of the choice within a URI is helpful. I maintain an
 OpenURL resolver that handles rft_id's by invoking a plugin
 depending on what the URI scheme is; for some URI schemes, such as
 info:, that then invokes another, lower-level plugin based on the
 type (e.g. doi in the example above). Such code is straightforward
 to write, simple to understand, easy to maintain, and nice to extend
 since all you have to do is provide one more encapsulated plugin.

Thanks for the clarification. Honestly I was also responding to Rob
Sanderson’s message (bad practice, surely) where he described URIs as
‘self-describing’, which seemed to me unclear. URIs are only
self-describing insofar as they describe what type of URI they are.

I think that all of us in this discussion like URIs. I can’t speak
for, say, Andrew, but, tentatively, I think that I prefer
info:doi/10./xxx to plain 10.111/xxx. I would just prefer
http://dx.doi.org/10./xxx

   (Caveat: I have no idea what rft_id, etc, means, so maybe that
   changes the meaning of what you are saying from how I read it.)
 
 No, it's doesn't :-)  rft_id is the name of the parameter used in
 OpenURL 1.0 to denote a referent ID, which is the same thing I've been
 calling a Thing Identifier elsewhere in this thread.  The point with
 this part of OpenURL is precisely that you can just shove any
 identifier at the resolver and leave it to do the best job it can.
 Your only responsibility is to ensure that the identifier you give it
 is in the form of a URI, so the resolver can use simple rules to pick
 it apart and decide what to do.

Thanks.

best,
Erik Hetzner


pgprSzdg7GAkN.pgp
Description: PGP signature


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Ray Denenberg, Library of Congress
You're right, if there were a web:  URI scheme, the world would be a 
better place.   But it's not, and the world is worse off for it.


It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
Here is some of my historical perspective (which may well differ from 
others').


Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
was for retrieving documents via ftp. The telnet scheme was for telnet. And 
so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
when we developed the z39.50 URI scheme, which was around 1995. Most of us 
were not wise to the ways of the web that long ago, but we were told, by 
those who were, that z39.50r: and z39.50s:  at the beginning of a URL 
are explicit indications that the URI is to be resolved by Z39.50.


A few years later the semantic web was conceived and alot of SW people began 
coining all manner of http URIs that had nothing to do with the http 
protocol.   By the time the rest of the world noticed, there were so many 
that it was too late to turn back. So instead, history was altered.  The 
company line became we never told you that the URI scheme was tied to a 
protocol.


Instead, they should have bit the bullet and coined a new scheme.  They 
didn't, and that's why we're in the mess we're in.


--Ray


- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, April 02, 2009 9:41 AM
Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
registering info: uris?)




From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Wednesday, April 01, 2009 2:26 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] resolution and identification (was Re:
[CODE4LIB] registering info: uris?)

This really puzzles me, because I thought http referred to a protocol:
hypertext transfer protocol. And when you put http://; in front of
something you are indicating that you are sending the following string
along to be processed by that protocol. It implies a certain
application
over the web, just as mailto:; implies a particular application. Yes,
http is the URI for the hypertext transfer protocol. That doesn't
negate the fact that it indicates a protocol.


RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
protocol.  Just because it says http people make all kinds of
assumptions about type of use, persistence, resolvability, etc.  As I
indicated in a prior message, whoever registered the http URI scheme
could have easily used the token web: instead of http:.  All the
URI scheme in RFC 3986 does is indicate what the syntax of the rest
of the URI will look like.  That's all.  You give an excellent
example: mailto.  The mailto URI scheme does not imply a particular
application.  It is a URI scheme with a specific syntax.  That URI
is often resolved with the SMTP (mail) protocol.  Whoever registered
the mailto URI scheme could have specified the token as smtp:
instead of mailto:;.


My reading of Cool URIs is
that they use the protocol, not just the URI. If they weren't intended
to take advantage of http then W3C would have used something else as a
URI. Read through the Cool URIs document and it's not about
identifiers,
it's all about using the *protocol* in service of identifying. Why use
http?


I'm assuming here when you say My reading of Cool URIs... means reading
the Cool URIs for the Semantic Web document and not the Cool URIs Don't
Change document.  The Cool URIs for the Semantic Web document is about
linked data.  Tim Burners-Lee's four linked data priciples state:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

(2) is an important aspect to linking.  The Web is a hypertext based 
system

that uses HTTP URIs to identify resources.  If you want to link, then you
need to use HTTP URIs.  There is only one protocol, today, that accepts
HTTP URIs as currency and its appropriately called HTTP and defined by
RFC 2616.

The Cool URIs for the Semantic Web document describes how an HTTP 
protocol
implementation (of RFC 2616) should respond to a dereference of an HTTP 
URI.
Its important to understand the URIs are just tokens that *can* be 
presented
to a protocol for resolution.  Its up to the protocol to define the 
currency
that it will accept, e.g., HTTP URIs, and its up to an implementation of 
the

protocol to define the tokens of that currency that it will accept.

It just so happens that HTTP URIs are accepted by the HTTP protocol, but 
in

the case of mailto URIs they are accepted by the SMTP protocol.  However,
it is important to note that a HTTP user agent, e.g., a browser, accepts
both HTTP and mailto URIs.  It decides that it should send the mailto URI
to an SMTP user agent, e.g., Outlook, Thunderbird, etc

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Mike Taylor
An account that has a depressing ring of accuracy to it.

Ray Denenberg, Library of Congress writes:
  You're right, if there were a web:  URI scheme, the world would be a 
  better place.   But it's not, and the world is worse off for it.
  
  It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. 
  Here is some of my historical perspective (which may well differ from 
  others').
  
  Back in the old days, URIs (or URLs)  were protocol based.  The ftp scheme 
  was for retrieving documents via ftp. The telnet scheme was for telnet. And 
  so on.   Some of you may remember the ZIG (Z39.50 Implementors Group) back 
  when we developed the z39.50 URI scheme, which was around 1995. Most of us 
  were not wise to the ways of the web that long ago, but we were told, by 
  those who were, that z39.50r: and z39.50s:  at the beginning of a URL 
  are explicit indications that the URI is to be resolved by Z39.50.
  
  A few years later the semantic web was conceived and alot of SW people began 
  coining all manner of http URIs that had nothing to do with the http 
  protocol.   By the time the rest of the world noticed, there were so many 
  that it was too late to turn back. So instead, history was altered.  The 
  company line became we never told you that the URI scheme was tied to a 
  protocol.
  
  Instead, they should have bit the bullet and coined a new scheme.  They 
  didn't, and that's why we're in the mess we're in.
  
  --Ray
  
  
  - Original Message - 
  From: Houghton,Andrew hough...@oclc.org
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, April 02, 2009 9:41 AM
  Subject: Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] 
  registering info: uris?)
  
  
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Karen Coyle
   Sent: Wednesday, April 01, 2009 2:26 PM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] resolution and identification (was Re:
   [CODE4LIB] registering info: uris?)
  
   This really puzzles me, because I thought http referred to a protocol:
   hypertext transfer protocol. And when you put http://; in front of
   something you are indicating that you are sending the following string
   along to be processed by that protocol. It implies a certain
   application
   over the web, just as mailto:; implies a particular application. Yes,
   http is the URI for the hypertext transfer protocol. That doesn't
   negate the fact that it indicates a protocol.
  
   RFC 3986 (URI generic syntax) says that http: is a URI scheme not a
   protocol.  Just because it says http people make all kinds of
   assumptions about type of use, persistence, resolvability, etc.  As I
   indicated in a prior message, whoever registered the http URI scheme
   could have easily used the token web: instead of http:.  All the
   URI scheme in RFC 3986 does is indicate what the syntax of the rest
   of the URI will look like.  That's all.  You give an excellent
   example: mailto.  The mailto URI scheme does not imply a particular
   application.  It is a URI scheme with a specific syntax.  That URI
   is often resolved with the SMTP (mail) protocol.  Whoever registered
   the mailto URI scheme could have specified the token as smtp:
   instead of mailto:;.
  
   My reading of Cool URIs is
   that they use the protocol, not just the URI. If they weren't intended
   to take advantage of http then W3C would have used something else as a
   URI. Read through the Cool URIs document and it's not about
   identifiers,
   it's all about using the *protocol* in service of identifying. Why use
   http?
  
   I'm assuming here when you say My reading of Cool URIs... means reading
   the Cool URIs for the Semantic Web document and not the Cool URIs Don't
   Change document.  The Cool URIs for the Semantic Web document is about
   linked data.  Tim Burners-Lee's four linked data priciples state:
  
 1. Use URIs as names for things.
 2. Use HTTP URIs so that people can look up those names.
 3. When someone looks up a URI, provide useful information.
 4. Include links to other URIs. so that they can discover more things.
  
   (2) is an important aspect to linking.  The Web is a hypertext based 
   system
   that uses HTTP URIs to identify resources.  If you want to link, then you
   need to use HTTP URIs.  There is only one protocol, today, that accepts
   HTTP URIs as currency and its appropriately called HTTP and defined by
   RFC 2616.
  
   The Cool URIs for the Semantic Web document describes how an HTTP 
   protocol
   implementation (of RFC 2616) should respond to a dereference of an HTTP 
   URI.
   Its important to understand the URIs are just tokens that *can* be 
   presented
   to a protocol for resolution.  Its up to the protocol to define the 
   currency
   that it will accept, e.g., HTTP URIs, and its up to an implementation of 
   the
   protocol to define the tokens of that currency that it will accept

Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Erik Hetzner
Hi Ray -

At Thu, 2 Apr 2009 13:48:19 -0400,
Ray Denenberg, Library of Congress wrote:
 
 You're right, if there were a web:  URI scheme, the world would be a 
 better place.   But it's not, and the world is worse off for it.

Well, the original concept of the ‘web’ was, as I understand it, to
bring together all the existing protocols (gopher, ftp, etc.), with
the new one in addition (HTTP), with one unifying address scheme, so
that you could have this ‘web browser’ that you could use for
everything. So web: would have been nice, but probably wouldn’t have
been accepted.

As it turns out, HTTP won overwhelmingly, and the older protocols died
off.

 It shouldn't surprise anyone that I am sympathetic to Karen's
 criticisms. Here is some of my historical perspective (which may
 well differ from others').
 
 Back in the old days, URIs (or URLs) were protocol based. The ftp
 scheme was for retrieving documents via ftp. The telnet scheme was
 for telnet. And so on. Some of you may remember the ZIG (Z39.50
 Implementors Group) back when we developed the z39.50 URI scheme,
 which was around 1995. Most of us were not wise to the ways of the
 web that long ago, but we were told, by those who were, that
 z39.50r: and z39.50s: at the beginning of a URL are explicit
 indications that the URI is to be resolved by Z39.50.
 
 A few years later the semantic web was conceived and alot of SW
 people began coining all manner of http URIs that had nothing to do
 with the http protocol. By the time the rest of the world noticed,
 there were so many that it was too late to turn back. So instead,
 history was altered. The company line became we never told you that
 the URI scheme was tied to a protocol.
 
 Instead, they should have bit the bullet and coined a new scheme.  They 
 didn't, and that's why we're in the mess we're in.

Not knowing the details of the history, your account seems correct to
me, except that I don’t think the web people tried to alter history.

I think of the web of having been a learning experience for all of us.
Yes, we used to think that the URI was tied to the protocol. But we
have learned that it doesn’t need to be, that HTTP URIs can be just
identifiers which happen to be dereferencable at the moment using the
HTTP protocol.

And it became useful to begin identifying lots of things, people and
places and so on, using identifiers, and it also seemed useful to use
a protocol that existed (HTTP), instead of coming up with the
Person-Metadata Transfer Protocol and inventing a new URI scheme
(pmtp://...) to resolve metadata about persons. Because HTTP doesn’t
care what kind of data it is sending down the line; it can happily
send metadata about people.

But that is how things grow; the http:// at the beginning of a URI may
eventually be a spandrel, when HTTP is dead and buried. And people
will wonder why the address http://dx.doi.org/10./xxx has those
funny characters in front of it. And doi.org will be long gone,
because they ran out of money, and their domain was taken over by
squatters, so we all had to agree to alter our browsers to include an
override to not use DNS to resolve the dx.doi.org domain but instead
point to a new, distributed system of DOI resolution.

We will need to fix these problems as they arise.

In my opinion, if we are interested in identifier persistent, clarity
about the difference between things and information about things,
creating a more useful web (of data), and the other things we ought to
be interested in, our time is best spent worrying about these things,
and how they can be built on top of the web. Our time is not well
spent in coming up with new ways to do things that web already does
for us.

For instance: if there is concern that HTTP URIs are not seen as being
persistent, it would be useful to try to add a method to HTTP which
indicated the persistence of an identifier. This way browsers could
display a little icon that indicated that the URI was persistent. A
user could click on this icon and get information about the
institution which claimed persistence for the URI, what the level of
support was, what other institution could back up that claim, etc.

Our time would not be well spent coming up with an elaborate scheme
for phttp:// URIs, creating a better DNS, with name control by a
better institution, and a better HTTP, with metadata, and a better
caching system, and so on. This is a lot of work and you forget what
you were trying to do in the first place, which is make HTTP URIs
persistent.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpOEgu0KFRiA.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Jonathan Rochkind

Rob Sanderson wrote:


info URIs, In My Opinion, are ideally suited for long term identifiers
of non information resources.  But http URIs are definitely better than
something which isn't a URI at all.

  
Through this discussion I am clarifying my thoughts on this too. I feel 
that info URIs are especially suited for identifiers that are not only 
long-term identifiers of non-web resources (an ISBN may identify an 
'information' resource, but it's not a web resource), but also 
especially when in addition all of the following are true:


0) Of potential wide-spread (not just local) interest. Ie, NOT a URI for 
a record in my local catalog.
1) The identifier vocabularly itself pre-dates the web and was not 
designed for the web. (ISBN, SuDoc).
2) There is not a controlling authority for the identifier vocabularly 
that _recognizes_ it's responsibility to maintain persistence _and_ has 
the resources to do fulfill that responsibility. That could be be because:
   a) There is no single controlling authority at all, the control is 
distributed, and they don't all have their coordinated act together for 
a web-world.
   b) The controlling authority hasn't yet realized that these 
identifiers matter for a web world, and don't care about URIs.
   c) There's nobody that wants to commit to this because they think 
they can't afford it.



That's what I'm thinking.  URI for a wikipedia concept from dbpedia?  
Sure, use http.  Those aren't going anywhere, because they are 
web-native, they were created to be web-native, the folks that created 
them realize what this means, and as long as their project exists 
they're likely to maintain them, and they're project isn't likely to go 
away.


URI for an ISBN or SuDocs?  I don't think the GPO is going anywhere, but 
the GPO isn't committing to supporting an http URI scheme, and whoever 
is, who knows if they're going anywhere. That issue is certainly 
mitigated by Ross using purl.org for these, instead of his own personal 
http URI. But another issue that makes us want a controlling authority 
is increasing the chances that everyone will use the _same_ URI.  If GPO 
were behind the purl.org/NET/sudoc URIs, those chances would be high. 
Just Ross on his own, the chances go down, later someone else (OCLC, 
GPO, some other guy like Ross) might accidentally create a 'competitor', 
which would be unfortunate. Note this isn't as much of a problem for 
born web resources -- nobody's going to accidentally create an 
alternate URI for a dbpedia term, because anybody that knows about 
dbpedia knows that it lives at dbpedia.


So those are my thoughts. Now everyone else can argue bitterly over them 
for a while. :)


And yes, I agree fully that ALL identifiers ought to be expressed as 
_some_ kind of URI.  Once you've done that, you've avoided the most 
important mistake, I think.


Jonathan


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Ross Singer
On Thu, Apr 2, 2009 at 3:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Note this isn't as much of a problem for born web resources -- nobody's
 going to accidentally create an alternate URI for a dbpedia term, because
 anybody that knows about dbpedia knows that it lives at dbpedia.

Unless they use the corresponding URI from Wikipedia or Freebase.

In short, identifiers are based on social contracts and only
validated through use.  Not because some authority or other has
endorsed them, but because they've proliferated through actual, real
world, use.

Different communities might have the reason to use a different
identifier that expresses the *exact same thing* because the syntax or
the format better suits their needs.  Or language.  Or environment.

And there's nothing any governing body or standards document can do to
stop them.

It's obviously a bad time to use this term, but identifiers will not
be produced by standards, but by market forces, branding and
momentum.

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 19:29:49 +0100,
Rob Sanderson wrote:
 All I meant by that was that the info:doi/ URI is more informative as to
 what the identifier actually is than just the doi by itself, which could
 be any string.  Equally, if I saw an SRW info URI like:
 
 info:srw/cql-context-set/2/relevance-1.0
 
 that's more informative than some ad-hoc URI for the same thing.
 Without the external knowledge that info:doi/xxx is a DOI and
 info:srw/cql-context-set/2/ is a cql context set administered by the
 owner with identifier '2' (which happens to be me), then they're still
 just opaque strings.

Yes, info:doi/10./xxx is more easily recognizable (‘sniffable’) as
a DOI than 10./xxx, both for humans and machines.

If we don’t know, by some external means, that a given string has the
form of some identifier, then we must guess, or sniff it.

But it is good practice to use other means to ensure that we know
whether or not any given string is an identifier, and if it is, what
type it is. Otherwise we can get confused by strings like go:home. Was
that a URI or not?

That said, I see no reason why the URI:

info:srw/cql-context-set/2/relevance-1.0

is more informative than the URI:

http://srw.org/cql-context-set/2/relevance-1.0

As you say, both are just opaque URIs without the additional
information. This information is provided by, in the first case, the
info-uri registry people, or, in the second case, by the organization
that owns srw.org.

 I could have said that http://srw.cheshire3.org/contextSets/rel/ was the
 identifier for it (SRU doesn't care) but that's the location for the
 retrieval documentation for the context set, not a collection of
 abstract access points.
 
 If srw.cheshire3.org was to go away, then people can still happily use
 the info URI with the continued knowledge that it shouldn't resolve to
 anything.

If srw.cheshire3.org goes away, people can still happily use the http
URI. (see below)

 With the potential dissolution of DLF, this has real implications, as
 DLF have an info URI namespace.  If they'd registered a bunch of URIs
 with diglib.org instead, which will go away, then people would have
 trouble using them.  Notably when someone else grabs the domain and
 starts using the URIs for something else.

The original URIs are still just as useful as identifiers, they have
become less useful as dereferenceable identifiers.

 Now if DLF were to disband AND reform, then they can happily go back to
 using info:dlf/ URIs even if they have a brand new domain.

The info:dlf/ URIs would be the same non-dereferenceable URIs they
always were, true. But what have we gained?

The issue of persistence of dereferenceablity is a real one. There are
solutions, e.g, other organizations can step in to host the domain;
the ARK scheme; or, we can all agree that the diglib.org domain is too
important to let be squatted, and agree that URIs that begin
http://diglib.org/ are special, and should by-pass DNS. [1]

  I think that all of us in this discussion like URIs. I can’t speak
  for, say, Andrew, but, tentatively, I think that I prefer
  info:doi/10./xxx to plain 10.111/xxx. I would just prefer
  http://dx.doi.org/10./xxx
 
 info URIs, In My Opinion, are ideally suited for long term
 identifiers of non information resources. But http URIs are
 definitely better than something which isn't a URI at all.

Something we can all agree on! URIs are better than no URIs.

best,
Erik

1. Take with a grain of salt, as this is not something I have fully
thought out the implications of.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgp4pFCxNEtYW.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Ed Summers
On Wed, Apr 1, 2009 at 6:14 AM, Mike Taylor m...@indexdata.com wrote:
 As usual, an ounce of example is worth a ton of exposition, so:

 Suppose I always keep a PDF of my latest paper at
        http://www.miketaylor.org.uk/latest.pdf
 for the benefit of people who want to keep an eye on my research.
 (Hey, it might happen!)  Today, I have a PDF there of a paper with the
 DOI 10./j.1475-4983.2007.00728.x.  Tomorrow, my new paper comes
 out, and I replace the old one with a PDF of that new paper whose DOI
 is 10.abcdefghij.  I move the PDF of the old paper to
        http://www.miketaylor.org.uk/previous.pdf

 Now, then -- the DOIs are identifiers: they are not in themsleves
 dereferencable (although of course they can be used as keys for some
 mechanism that knows how to dereference them).  Each DOI always
 identifies the same Thing.  The URLs are locations: they are
 dereferencable, but they do not give you any guarantee about what you
 will find at that location.  Two different days, two different papers.
 Note that a single location (latest.pdf) contains at different times
 two different Things.  And note that a single Thing (the older of the
 two papers) can be found at different times in two different
 locations.  In contrast, the same identifier always identifies the
 same Thing, irrespective of what location it's at.

Hoorah for examples!

Assuming a world where you cannot de-reference this DOI what is it good for?

//Ed


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Keith Jenkins
On Wed, Apr 1, 2009 at 8:37 AM, Mike Taylor m...@indexdata.com wrote:
 Worse, consider how the actionable-identifier approach would translate
 to other non-actionable identifiers like ISBNs.  If I offer the
 non-actionable identifier
        info:isbn/025490
 which identified Farlow and Brett-Surman's edited volume The Complete
 Dinosaur, it's obvious that you have a choice of methods for
 resolving the ISBN

... but the identifier gives no indication of what those choices might
be, and I wouldn't even be able to find out anything more about the
info:isbn scheme unless I happened to know that http://info-uri.info/
is the registry for info: URIs (or could Google my way to it).

An http: identifier could at least take you to general information
about the scheme (perhaps with options for resolution), if not
directly to some description of the identified thing itself.

Keith


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Wednesday, April 01, 2009 8:38 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Ross Singer writes:
   I suppose my point is, there's a valid case for identifiers like
   your doi, I think we can agree on that (well, we don't have to
   agree, these identifiers will exist and continue to exist long
   after we've grown tired of flashing out gang signs).  What I don't
   understand is the reason to express that identifier as:
  
   info:doi/10./j.1475-4983.2007.00728.x
  
   when
  
   http://dx.doi.org/10./j.1475-4983.2007.00728.x
  
   can serve exactly the same function *and* be actionable.

This was exactly the point I was making, but you said it much more
coherently than what I said, Ross.  If you are going to use a 
natural identifier, like doi, isbn, lccn, etc., then use it, but 
if you are going to Web-ify that natural identifier, use an HTTP 
URI.  It doesn't need to be actionable today, but can be tomorrow, 
without anybody having to write a new resolution mechanism and 
clients having to integrate that new resolution mechanism in their
systems.  Typically, most resolution mechanisms for unresolvable 
URI schemes use HTTP URIs anyway and amount to:

http://resolve.example.org/?uri=info:isbn/141574338X 
http://resolve.example.org/?uri=urn:isbn:141574338X 

which could have just been:

http://isbn.info/141574338X

 The problem with the latter identifier (and to be clear, yes, I agree
 that it COULD function as an identifier) is that it gives the
 impression that what you get when you dereference the DOI is that
 specific resource, i.e. it enshrines dx.doi.org as THE way of
 dereferencing DOIs.

I agree that Ross's DOI example could function as an identifier.  I think 
we can agree that RFC 3986 says that URIs are just tokens with a specified 
syntax.  Nothing in RFC 3986 says that a URI has to be actionable.

You are talking about an impression that isn't enshrined in RFC 3986.
It might be better to think about this in terms of the W3C's Cool URIs
for the Semantic Web document.  That document classifies URIs into
three types: Real World Objects, Generic Documents and Web Documents.  So
which type is: http://dx.doi.org/10./j.1475-4983.2007.00728.x?

It depends.  If I say that it is a Real World Object, it’s an identifier
for the actual DOI identifier.  If I say that it is a Web Document, then 
dereferencing it will give me a specific resource.  In this case I can 
have and probably should have both a Real World Object URI and a Web 
Document URI.

 What if I don't want to get the article from dx.doi.org?  Maybe if I
 go via that site, it'll point me to Elsevier's pay-for copy of an
 article, whereas if I'd fed the DOI to my local library's resolver, it
 would have sent me to Blackwell's version which the library has a
 subscription for.  An actionable URI mandates (or at leasts strongly
 suggests) a particular course of action: but I don't want you to tell
 me what to _do_, I just what you to tell me what the Thing is.

People wanting to identify the DOI use the Real World Object URI and
people wanting to find out information about the DOI use the Web 
Document URI.

Both these URI, Real World Object and Web Document, are HTTP URIs, so 
there is little if any value in using info or URN URIs.  People *tend* 
to use URN URIs because RFC 2141 states that the URI has persistents 
and people *tend* to use info URIs because RFC 4452 because it states 
there is no persistents.  However, persistents is a policy statement 
made by the minter of a URI.  You can make a persistents policy 
statement about any URI including HTTP URIs.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Rob Sanderson
On Wed, 2009-04-01 at 14:17 +0100, Mike Taylor wrote:
 Ed Summers writes:
   Assuming a world where you cannot de-reference this DOI what is it
   good for?
 
 It wouldn't be good for much if you couldn't dereference it at all.
 The point is that (I argue) the identifier shouldn't tie itself to a
 particular dereferencing mechanism (such as dx.doi.org, or amazon.com)
 but should be dereferenced by software that knows what's the most
 appropriate dereferencing mechanism _for you_ in your situation, with
 your subscriptions, at particular distances from specific libraries,
 etc.

Heh, that sounds like a good idea. Maybe we could call it an OpenURL?

And that distinction about having a dereferencing mechanism sounds okay,
but let's call it a ... service. Then we could define an architecture
for that sort of thing rather than a Resource oriented one.  We could
call it a Service Oriented Architecture.

Oh, wait... 

Rob


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Mike Taylor
Houghton,Andrew writes:
   The point is that (I argue) the identifier shouldn't tie itself
   to a particular dereferencing mechanism (such as dx.doi.org, or
   amazon.com) but should be dereferenced by software that knows
   what's the most appropriate dereferencing mechanism _for you_ in
   your situation, with your subscriptions, at particular distances
   from specific libraries, etc.
  
  Lets separate your argument into two pieces.  Identification and
  resolution.  The DOI is the identifier and it inherently doesn't
  tie itself to any resolution mechanism.

Yes.  So far, we agree :-)

  So creating an info URI for it is meaningless, it's just another
  alias for the DOI.

Not quite.  Embedding a DOI in an info URI (or a URN) means that the
identifier describes its own type.  If you just get the naked string
10./j.1475-4983.2007.00728.x
passed to you, say as an rft_id in an OpenURL, then you can't tell
(except by guessing) whether it's a DOI, a SICI, and ISBN or a
biological species identifier.  But if you get
info:doi/10./j.1475-4983.2007.00728.x
then you know what you've got, and can act on it accordingly.

  I can create an HTTP resolution mechanism for DOI's by doing:
  
  http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x
  
  or
  
  http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x
  
  since the info URI contains the natural DOI identifier, wrapping
  it in a URI scheme has no value when I could have used the DOI
  identifier directly, as in the first HTTP resolution example.

In this case, you're right -- because the parameter name doi tells
you what vocabulary the identifier is drawn from, much as the prefix
of an XML element name tells you what namespace it's drawn from.  But
in general, when you can't rely on having that extra bit of data
floating around alongside the actual identifier (as in the OpenURL
rft_id example) it's nice to have identifiers that are
self-describing.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  If only there were some EASY, COWARDLY way out of this --
 Bob the Angry Flower, www.angryflower.com


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Eric Hellman

I'll bite.

There are actually a number of http URLs that work like 
http://dx.doi.org/10./j.1475-4983.2007.00728.x
One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x
Another is run by crossref;  Some OpenURL ink servers also have doi  
proxy capability.
So for code to extract the doi reliably from http urls, the code needs  
to know all the possibilities for the doi proxy stem. The proxies also  
tend to have optional parameters that can control the resolution. In  
principle, the info:doi/ stem addresses this.


On Apr 1, 2009, at 7:27 AM, Ross Singer wrote:

 What I don't understand is the
reason to express that identifier as:

info:doi/10./j.1475-4983.2007.00728.x

when

http://dx.doi.org/10./j.1475-4983.2007.00728.x



Eric Hellman

e...@hellman.net (personal)
http://hellman.net/eric/


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Wednesday, April 01, 2009 9:35 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Houghton,Andrew writes:
   So creating an info URI for it is meaningless, it's just another
   alias for the DOI.
 
 Not quite.  Embedding a DOI in an info URI (or a URN) means that the
 identifier describes its own type.  If you just get the naked string
   10./j.1475-4983.2007.00728.x
 passed to you, say as an rft_id in an OpenURL, then you can't tell
 (except by guessing) whether it's a DOI, a SICI, and ISBN or a
 biological species identifier.  But if you get
   info:doi/10./j.1475-4983.2007.00728.x
 then you know what you've got, and can act on it accordingly.

Now you are changing the argument to a specific resolution mechanism,
e.g., OpenURL.  OpenURL could have easily defined rft_idType where
you specified DOI, SICI, ISBN, etc. along with its actual identifier
value in rft_id.  However, given that OpenURL didn't do this, there
is no difference plugging either of the following URIs into rft_id:

http://dx.doi.org/10./j.1475-4983.2007.00728.x
info:doi/10./j.1475-4983.2007.00728.x 

when I identify the HTTP URI as a Real World Object.  This was the
whole point of the W3C TAG httpRange-14 decision which the Cool
URIs for the Semantic Web document is based on.

So again, wrapping the natural DOI in an unresolvable URI scheme
is meaningless.  When talking about resolution mechanisms any number
of implementations are possible, including separating an identifier
type from it value or conflating the two.  In the two URIs above
the only real differences are:

1) http: vs. info: URI scheme
2) an authority named: dx.doi.org vs. doi

These are just simple substitutions.  Whoever registered the info URI
for doi could have easily applied for an authority named: dx.doi.org
instead of just doi, then the only difference would be the URI scheme.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Eric Hellman
 Sent: Wednesday, April 01, 2009 9:51 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 There are actually a number of http URLs that work like
 http://dx.doi.org/10./j.1475-4983.2007.00728.x
 One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x
 Another is run by crossref;  Some OpenURL ink servers also have doi
 proxy capability.
 So for code to extract the doi reliably from http urls, the code needs
 to know all the possibilities for the doi proxy stem. The proxies also
 tend to have optional parameters that can control the resolution. In
 principle, the info:doi/ stem addresses this.

Again we have moved the discussion to a specific resolution mechanism,
e.g., OpenURL.  OpenURL could have been defined differently, such
that rft_id and rft_idScheme were available and you used the actual
DOI value and specified the scheme of the identifier.  Then the issue
of extraction of the identifier value from the URI goes away, because
there is no URI needed.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Mike Taylor
Houghton,Andrew writes:
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Eric Hellman
   Sent: Wednesday, April 01, 2009 9:51 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] registering info: uris?
   
   There are actually a number of http URLs that work like
   http://dx.doi.org/10./j.1475-4983.2007.00728.x
   One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x
   Another is run by crossref;  Some OpenURL ink servers also have doi
   proxy capability.
   So for code to extract the doi reliably from http urls, the code needs
   to know all the possibilities for the doi proxy stem. The proxies also
   tend to have optional parameters that can control the resolution. In
   principle, the info:doi/ stem addresses this.
  
  Again we have moved the discussion to a specific resolution mechanism,
  e.g., OpenURL.  OpenURL could have been defined differently, such
  that rft_id and rft_idScheme were available and you used the actual
  DOI value and specified the scheme of the identifier.  Then the issue
  of extraction of the identifier value from the URI goes away, because
  there is no URI needed.

Yes, that would have been OK, too.  But no doubt there are other
contexts where it's possible to pass in an identifier without also
being able to say and by the way, it's of type XYZ.  Surely you
don't disagree that it's good for identifiers to be self-describing?

It's the same with actionable URLs: isn't it better than I can tell
you:

http://www.miketaylor.org.uk/dino/pubs/

Instead of having to say:

www.miketaylor.org.uk/dino/pubs/
Oh, by the way, access this using HTTP rather than FTP.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  A Linux system requires rebooting about as often as a Windoze
 system requires re-installing -- David Joffe.


[CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Jonathan Rochkind

Houghton,Andrew wrote:

Lets separate your argument into two pieces. Identification and
resolution.  The DOI is the identifier and it inherently doesn't
tie itself to any resolution mechanism.  So creating an info URI
for it is meaningless, it's just another alias for the DOI.  I 
can create an HTTP resolution mechanism for DOI's by doing:


http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x

or

http://resolve.example.org/?uri=info:doi/10./j.1475-4983.2007.00728.x

since the info URI contains the natural DOI identifier, wrapping it
in a URI scheme has no value when I could have used the DOI identifier
directly, as in the first HTTP resolution example.
  


I disagree that wrapping it in a URI scheme has no value.  We have very 
much software and schemas that are built to store URIs, even if they 
don't know what the URI is or what can be done with it, we have 
infrastructure in place for dealing with URIs.


So there is value in wrapping a 'natural' identifier in a URI, even if 
that URI does not carry it's own resolution mechanism with it. I have 
run into this in several places in my own work.


I share Mike's concerns about tying resolution to identification in one 
mechanism.  As a sort of general principle or 'pattern' or design, 
trying to make one mechanism do two jobs at once is a 'bad smell'.  It's 
in fact (I hope this isn't too far afield) how I'd sum up much of the 
failure of AACR2/MARC, involving our 'controlled headings' (see me 
expanding on this in some blog posts at 
http://bibwild.wordpress.com/2008/01/17/identifiers-and-display-labels-again/).



On the other hand, it is awfully _convenient_ to combine these two 
functions in one mechanism. And convenience does matter too.


I can see both sides. So I think we just do what feels right, and when 
we all disagree on what feels right, we pick one. I don't share the 
opinion of those who think it's obvious that everything should be an 
http uri, nor do I share the opinion of those who think it's obvious 
that this is a disaster.


DOI is definitely one good example of where One Canonical Resolution 
fails.  The DOI _resolution_ system fails for me -- it does not reliably 
or predictably deliver the right document for my users.  But a DOI as an 
identifier is still useful for me.  Even if that DOI were expressed in a 
URI as http://dx.doi.org/resolve/10./j.1475-4983.2007.00728.x, I 
STILL wouldn't actually use the HTTP server at dx.doi.org to resolve 
it.  I'd extract the actual DOI out of it, and use a different 
resolution mechanism.


Another example to think about is what happens when the protocol for 
resolution changes?  Right now already we could find a resolution 
service starting to make available and/or insist upon https protocol 
resolution.  But all those existing identifiers expressed as http URIs 
should not change, they are meant to be persistent. So already it's 
possible for an identifier originally intended to describe it's own 
resolution to be slightly wrong.  Is this confusing? In the future, 
maybe we'll have something different than http entirely.



Jonathan


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Jon

+1

Jon Stroop
Metadata Analyst
C-17-D2 Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://diglib.princeton.edu
http://diglib.princeton.edu/ead



Edward M. Corrado wrote:

I disagree. Keep this going. A delete key is in easy reach and if you
have a mail reader that does threading you can easily ignore the
thread. I have been finding this discussion rather educational.

Edward

On Wed, Apr 1, 2009 at 10:14 AM, Glen Newton - NRC/CNRC CISTI/ICIST
Research glen.new...@nrc-cnrc.gc.ca wrote:
  

I count 75 messages on this topic. Perhaps it is time to take this off
list? Someone give us a summary when/if this is resolved? Or start a
new list for this issue and tell us where it is?

thanks,

Glen



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Eric Hellman
Sent: Wednesday, April 01, 2009 9:51 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

There are actually a number of http URLs that work like
http://dx.doi.org/10./j.1475-4983.2007.00728.x
One of them is http://doi.wiley.com/10./j.1475-4983.2007.00728.x
Another is run by crossref;  Some OpenURL ink servers also have doi
proxy capability.
So for code to extract the doi reliably from http urls, the code needs
to know all the possibilities for the doi proxy stem. The proxies also
tend to have optional parameters that can control the resolution. In
principle, the info:doi/ stem addresses this.
  

Again we have moved the discussion to a specific resolution mechanism,
e.g., OpenURL.  OpenURL could have been defined differently, such
that rft_id and rft_idScheme were available and you used the actual
DOI value and specified the scheme of the identifier.  Then the issue
of extraction of the identifier value from the URI goes away, because
there is no URI needed.


Andy.




Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Jonathan Rochkind
I admit that httprange-14 still confuses me. (I have no idea why it's 
called httprange-14 for one thing).


But how do you identify the URI as being a Real World Object? I don't 
understand what it entails.


And http://doi.org/*;  describes it's own type only to software that 
knows what a URI beginning http://doi.org means, right? 

What about Eric Hellman's point that there are a variety of possible 
http URIs (not just possible but _in use_) that encapsulate a DOI, and 
given software would have to know all of the possible templates (with 
more being created all the time)?


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, April 01, 2009 11:08 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] resolution and identification (was Re: [CODE4LIB]
registering info: uris?)

Houghton,Andrew wrote:


Lets separate your argument into two pieces. Identification and
resolution.  The DOI is the identifier and it inherently doesn't
tie itself to any resolution mechanism.  So creating an info URI
for it is meaningless, it's just another alias for the DOI.  I
can create an HTTP resolution mechanism for DOI's by doing:

http://resolve.example.org/?doi=10./j.1475-4983.2007.00728.x

or

http://resolve.example.org/?uri=info:doi/10./j.1475-
  

4983.2007.00728.x


since the info URI contains the natural DOI identifier, wrapping it
in a URI scheme has no value when I could have used the DOI
  

identifier


directly, as in the first HTTP resolution example.

  

I disagree that wrapping it in a URI scheme has no value.  We have very
much software and schemas that are built to store URIs, even if they
don't know what the URI is or what can be done with it, we have
infrastructure in place for dealing with URIs.



Oops... that should have read ... wrapping it in an unresolvable URI
scheme...

The point being that:

urn:doi:*
info:doi:*

provide no advantages over:

http://doi.org/*

when, per W3C TAG httpRange-14 decision you identify the URI as being a 
Real World Object.  When identifying the HTTP URI as a Real World Object,

it is the same as what Mike said about the info URI that: the identifier
describes its own type.


Andy.

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
On Wed, Apr 1, 2009 at 11:37 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 I admit that httprange-14 still confuses me. (I have no idea why it's
 called httprange-14 for one thing).

http://www.w3.org/2001/tag/group/track/issues/14

Some background:
http://efoundations.typepad.com/efoundations/2009/02/httprange14-cool-uris-frbr.html

 And http://doi.org/*;  describes it's own type only to software that
 knows what a URI beginning http://doi.org means, right?

How is that different from the software knowing what info:doi/ means?
The difference is, how much more software knows what http: means vs.
info:?

And this, I think, has got to be point here.  How many times do we
need to marginalize ourselves with our ideals and expectations that
nobody else adheres to before we're rendered completely irrelevant?

Doesn't it make sense to coopt the mainstream processes and apply them
to our ideals?  What, exactly, is the resistance here?

 What about Eric Hellman's point that there are a variety of possible http
 URIs (not just possible but _in use_) that encapsulate a DOI, and given
 software would have to know all of the possible templates (with more being
 created all the time)?

Right, but here again is where we're talking about the difference
between a location and the identifier.

We're talking about establishing
http://dx.doi.org/10./j.1475-4983.2007.00728.x

(or something like that --
http://hdl.handle.net/10./j.1475-4983.2007.00728.x might be more
appropriate)

as the identifier for doi:10./j.1475-4983.2007.00728.x

That you can access it via
http://doi.wiley.com/10./j.1475-4983.2007.00728.x (or resolve it
there) doesn't mean that that's the identifier for it.

-Ross.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ray Denenberg, Library of Congress

From: Houghton,Andrew hough...@oclc.org


The point being that:

urn:doi:*
info:doi:*

provide no advantages over:

http://doi.org/*



I think they do.

I realize this is pretty much a dead-end debate as everyone has dug 
themselves into a position and nobody is going to change their mind. It is a 
philosophical debate and there isn't a right answer.  But in my opinion 


I won't use the doi example because it's overloaded.  Let's talk about the 
hypothetical sudoc. I think info:sudoc/xyz provides an advantages over: 
http://sudoc.org/xyz   if the latter is not going to resolve.


Why? Because it drives me nuts to see http URIs everywhere that give all 
appearances of resolvability - browsers, editors, etc.  turn them into 
clickable links.   Now, if you are setting up a resolution service where you 
get the document that the sudoc identifies when you click on the URI, then 
http is appropriate.   The *actual document*. Not a description of it in 
lieu of the document.  And the so-called architectural justification that 
it's ok to return metadata instead of the resource (representation) -- I 
don't buy it.


--Ray 


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Wednesday, April 01, 2009 10:17 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Houghton,Andrew writes:
   Again we have moved the discussion to a specific resolution
 mechanism,
   e.g., OpenURL.  OpenURL could have been defined differently, such
   that rft_id and rft_idScheme were available and you used the actual
   DOI value and specified the scheme of the identifier.  Then the
 issue
   of extraction of the identifier value from the URI goes away,
 because
   there is no URI needed.
 
 Yes, that would have been OK, too.  But no doubt there are other
 contexts where it's possible to pass in an identifier without also
 being able to say and by the way, it's of type XYZ.  Surely you
 don't disagree that it's good for identifiers to be self-describing?

Ok, now we moved the discussion back to identifiers rather than
resolution mechanisms.  Absolutely agree that it's good for
identifiers to be self-describing, I wasn't saying otherwise.
However, lets take the following URIs:

http://any.identifier.org/?scheme=doiid=10./j.1475-4983.2007.00728.x
info:doi/10./j.1475-4983.2007.00728.x
urn:doi:10./j.1475-4983.2007.00728.x

All three are self describing URI.  The HTTP URI does exactly the same thing
as the info URI without having to create a new URI scheme, e.g., info, and
the argument made by IETF and W3C against the creation of info URIs.  Also,
since the info URI folks actually created a domain name for registering info 
URIs you could have easily changed any.identifier.org to info-uri.info
to achieve the same effect as the info URI.

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Wednesday, April 01, 2009 10:44 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?

 Imagine your web-browser extended by a plugin that knows how to
 resolve particularly kinds of info: URLs.  If you just paste the raw
 DOI into the URI bar, it won't have a clue what to do with it, but the
 wrapped-in-a-URI version stands alone and self-describes, so the
 plugin can pull it apart and say, ah yes, this URI is a DOI, and I
 know how my user has configured me to resolve those.

Sure you can imagine a web-browser plugin, but these things never happen
due to a) the cost of developing or, b) in order for it to work you need
a plugin to work for every type of browser.  This is why the Architecture
of the Web document states:

  While Web architecture allows the definition of new schemes, introducing 
   a new scheme is costly. Many aspects of URI processing are scheme-dependent, 
   and a large amount of deployed software already processes URIs of well-known 
   schemes. Introducing a new URI scheme requires the development and 
deployment 
   not only of client software to handle the scheme, but also of ancillary 
agents 
   such as gateways, proxies, and caches. See [RFC2718] for other 
considerations 
   and costs related to URI scheme design

 What you seem to be suggesting (are you?) is that in the former case, the 
 resolver should recognise that the HTTP URL matches the regular expression
   ^http://dx\.doi\.org\.(.*)$/
 and so extract the match and go off and do something else with it.

Back to resolution mechanisms... I'm not suggesting anything.  You are 
suggesting
a resolution mechanism implementation which uses regular expressions.  That is 
one of many ways a resolution mechanism can retrieve the embedded DOI or 
identifier
of choice.  URI Templates is another and given this URI:

http://any.identifier.org/?scheme=doiid=10./j.1475-4983.2007.00728.x

any Web library on the planet can pull the query parameters out of the URI.

 as the actionable identifier might be something uglier...

A URI is just a token with a predefined syntax, per RFC 3986, used to identify a
resource which can be an abstract thing, e.g., Real World Object or a 
representation of a resource, e.g., a Web Document.  One could postulate that 
all 
URIs are ugly.  Whether a URI is ugly or not is irrelevant.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Jonathan Rochkind
I completely disagree.  There are all sorts of useful identifiers I use 
in my work every day that can not be automatically dereferenced.


Jonathan

Ed Summers wrote:

On Wed, Apr 1, 2009 at 9:17 AM, Mike Taylor m...@indexdata.com wrote:
  

It wouldn't be good for much if you couldn't dereference it at all.



I totally agree.

//Ed

  


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:
 But shouldn't we be able to know the difference between an identifier and a
 locator? Isn't that the problem here? That you don't know which it is if it
 starts with http://.

But you do if it starts with http://dx.doi.org

I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Ray Denenberg, Library of Congress

From: Jonathan Rochkind rochk...@jhu.edu
 There are all sorts of useful identifiers I use in my work every day that 
can not be automatically dereferenced.


Even more to the point: there is no sound definition of dereference.  To 
dereference a resource means to retrieve a representation of it. There has 
never been any agreement within the w3c of what constitutes a 
representation.


--Ray 


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Karen Coyle

Ross Singer wrote:

On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:
  

But shouldn't we be able to know the difference between an identifier and a
locator? Isn't that the problem here? That you don't know which it is if it
starts with http://.



But you do if it starts with http://dx.doi.org
  


No, *I* don't. And neither does my email program, since it displayed it 
as a URL (blue and underlined). That's inside knowledge, not part of the 
technology. Someone COULD create a web site at that address, and there's 
nothing in the URI itself to tell me if it's a URI or a URL.


The general convention is that http://; is a web address, a location. I 
realize that it's also a form of URI, but that's a minority use of http. 
This leads to a great deal of confusion. I understand the desire to use 
domain names as a way to create unique, managed identifiers, but the 
http part is what is causing us problems.


John Kunze's ARK system attempted to work around this by using http to 
retrieve information about the URI, so you're not just left guessing. 
It's not a question of resolution, but of giving you a short list of 
things that you can learn about a URI that begins with http. However, 
again, unless you know the secret you have no idea that those particular 
URI/Ls have that capability. So again we're going beyond the technology 
into some human knowledge that has to be there to take advantage of the 
capabilities. It doesn't seem so far fetched to make it possible for 
programs (dumb, dumb programs) to know the difference between an 
identifier and a location based on something universal, like a prefix, 
without having to be coded for dozens or hundreds of exceptions.


kc


I still don't see the difference.  The same logic that would be
required to parse and understand the info: uri scheme could be used to
apply towards an http uri scheme.

-Ross.


  



--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Wednesday, April 01, 2009 1:06 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)
 
 The general convention is that http://; is a web address, a location.
 I
 realize that it's also a form of URI, but that's a minority use of
 http.
 This leads to a great deal of confusion. I understand the desire to use
 domain names as a way to create unique, managed identifiers, but the
 http part is what is causing us problems.

http:// is an HTTP URI, defined by RFC 3986, loosely I will agree that
it is a web addresss.  However, it is not a location.  URIs according
to RFC 3986 are just tokens to identify resources.  These tokens, e.g.,
URIs are presented to protocol mechanisms as part of the dereferencing
process to locate and retrieve a representation of the resource.

People see http: and assume that it means the HTTP protocol so it must
be a locator.  Whoever initially registered the HTTP URI scheme could 
have used web as the token instead and we would all be doing:
web://example.org/.  This is the confusion.  People don't understand 
what RFC 3986 is saying.  It makes no claim that any URI registered 
scheme has persistence or can be dereferenced.  An HTTP URI is just a 
token to identify some resource, nothing more.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Ed Summers
On Wed, Apr 1, 2009 at 12:28 PM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 Even more to the point: there is no sound definition of dereference.  To
 dereference a resource means to retrieve a representation of it. There has
 never been any agreement within the w3c of what constitutes a
 representation.

So are you not a fan of:

  http://www.w3.org/TR/webarch/#internet-media-type

//Ed


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-01 Thread Ross Singer
My point is that I don't see how they're different in practice.

And one of them actually allowed you to do something from your email client.

-Ross.

On Wed, Apr 1, 2009 at 1:20 PM, Karen Coyle li...@kcoyle.net wrote:
 Ross, I don't get your point. My point was about the confusion between two
 things that begin: http:// but that are very different in practice. What's
 yours?

 kc

 Ross Singer wrote:

 Your email client knew what do with:

 info:doi/10./j.1475-4983.2007.00728.x ?

 doi:10./j.1475-4983.2007.00728.x ?

 Or did you recognize the info:doi scheme and Google it?

 Or would this, in case of 99% of the world, just look like gibberish
 or part of some nerd's PGP key?

 -Ross.

 On Wed, Apr 1, 2009 at 1:06 PM, Karen Coyle li...@kcoyle.net wrote:


 Ross Singer wrote:


 On Wed, Apr 1, 2009 at 12:22 PM, Karen Coyle li...@kcoyle.net wrote:



 But shouldn't we be able to know the difference between an identifier
 and
 a
 locator? Isn't that the problem here? That you don't know which it is
 if
 it
 starts with http://.



 But you do if it starts with http://dx.doi.org



 No, *I* don't. And neither does my email program, since it displayed it
 as a
 URL (blue and underlined). That's inside knowledge, not part of the
 technology. Someone COULD create a web site at that address, and there's
 nothing in the URI itself to tell me if it's a URI or a URL.

 The general convention is that http://; is a web address, a location. I
 realize that it's also a form of URI, but that's a minority use of http.
 This leads to a great deal of confusion. I understand the desire to use
 domain names as a way to create unique, managed identifiers, but the http
 part is what is causing us problems.

 John Kunze's ARK system attempted to work around this by using http to
 retrieve information about the URI, so you're not just left guessing.
 It's
 not a question of resolution, but of giving you a short list of things
 that
 you can learn about a URI that begins with http. However, again, unless
 you
 know the secret you have no idea that those particular URI/Ls have that
 capability. So again we're going beyond the technology into some human
 knowledge that has to be there to take advantage of the capabilities. It
 doesn't seem so far fetched to make it possible for programs (dumb, dumb
 programs) to know the difference between an identifier and a location
 based
 on something universal, like a prefix, without having to be coded for
 dozens
 or hundreds of exceptions.

 kc



 I still don't see the difference.  The same logic that would be
 required to parse and understand the info: uri scheme could be used to
 apply towards an http uri scheme.

 -Ross.





 --
 ---
 Karen Coyle / Digital Library Consultant
 kco...@kcoyle.net http://www.kcoyle.net
 ph.: 510-540-7596   skype: kcoylenet
 fx.: 510-848-3913
 mo.: 510-435-8234
 







 --
 ---
 Karen Coyle / Digital Library Consultant
 kco...@kcoyle.net http://www.kcoyle.net
 ph.: 510-540-7596   skype: kcoylenet
 fx.: 510-848-3913
 mo.: 510-435-8234
 



Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Erik Hetzner
At Wed, 1 Apr 2009 14:34:45 +0100,
Mike Taylor wrote:
 Not quite.  Embedding a DOI in an info URI (or a URN) means that the
 identifier describes its own type.  If you just get the naked string
   10./j.1475-4983.2007.00728.x
 passed to you, say as an rft_id in an OpenURL, then you can't tell
 (except by guessing) whether it's a DOI, a SICI, and ISBN or a
 biological species identifier.  But if you get
   info:doi/10./j.1475-4983.2007.00728.x
 then you know what you've got, and can act on it accordingly.

It seems to me that you are just pushing out by one more level the
mechanism to be able to tell what something is.

That is - before you needed to know that 10./xxx was a DOI. Now
you need to know that info:doi/10./xxx is a URI.

Without external knowledge that info:doi/10./xxx is a URI, I can
only guess.

(Caveat: I have no idea what rft_id, etc, means, so maybe that changes
the meaning of what you are saying from how I read it.)

-Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpRKlTtYU7Wa.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-31 Thread Ross Singer
On Tue, Mar 31, 2009 at 5:55 AM, Mike Taylor m...@indexdata.com wrote:
 Identifiers identify; locations locate.

I've been avoiding and ignoring this all day, because I wanted the
thread to die and we all move on with our lives.  But Kevin Clarke
just quoted this on Twitter, and I felt I couldn't let this slide by.

Locations do not locate.  Locations identify 'place'.  They are still
identifiers.

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Fri, 27 Mar 2009 20:56:42 -0400,
Ross Singer wrote:

 So, in a what is probably a vain attempt to put this debate to rest, I
 created a partial redirect PURL for sudoc:

 http://purl.org/NET/sudoc/

 If you pass it any urlencoded sudoc string, you'll be redirected to
 the GPO's Aleph catalog that searches the sudoc field for that string.

 http://purl.org/NET/sudoc/E%202.11/3:EL%202

 should take you to:
 http://catalog.gpo.gov/F/?func=find-cccl_term=GVD%3DE%202.11/3:EL%202

 There, Jonathan, you have a dereferenceable URI structure that you
 A) don't have to worry about pointing at something misleading
 B) don't have to maintain (although I'll be happy to add whoever as a
 maintainer to this PURL)

 If the GPO ever has a better alternative, we just point the PURL at it
 in the future.

Beautiful work, Ross. Thank you.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpC8fHWXKSFo.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Erik Hetzner erik.hetz...@ucop.edu

 I believe that registering a domain would be less
work than going through an info URI registration process, but I don’t
know how difficult the info URI registration process would be (thus
bringing the conversation full circle). [1]



Leaving aside religious issues I just want to be  sure we're clear on one 
point: the work required for the info URI process is exactly the amount of 
work required, no more no less.  It forces you to specify clear syntax and 
semantics, normalization (if applicable), etc.  If you go a different route 
because it's less work, then you're probably avoiding doing work that needs 
to be done.


--Ray 


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
That's got a session token in it, Andrew.  Not to mention it will no 
longer resolve to anything whenever GPO changes their ILS platform.


You guys don't seem to believe that I've spent a chunk of time 
investigating all this stuff before I even brought it up here. I did, 
really!


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Friday, March 27, 2009 6:09 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

If GPO had a system where I could resolve Sudoc identifiers, then this
whole problem would be solved right there, I wouldn't need to go any
further, I'd just use the http URI's associated with that system as
identifiers! This whole problem statement is because GPO does not
provide any persistent URIs for sudoc's in the first place, right?



With a little Googling how about this:

sudoc: E 2.11/3:EL 2
http://catalog.gpo.gov/F/FIBJ8T23DNC33L6KEDYR7Q8Q3MF6BI9H7Q5XPG4KB3N57HX35X-17544?func=scanscan_code=SUDscan_start=E+2.11%2F3%3AEL+2

looks like the param scan_start= holds the sudoc number.  Sure it gives you 
other
results, but its might work for your purposes.

Seems like they are creating bad HTTP responses since Fiddler throws an protocol
violation because they do not end the HTTP headers with CR,LF,CR,LF and instead 
use LF,LF...



Andy.
  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind

I think this is a good point.

Ray Denenberg, Library of Congress wrote:

 From: Erik Hetzner erik.hetz...@ucop.edu
  

 I believe that registering a domain would be less
work than going through an info URI registration process, but I don’t
know how difficult the info URI registration process would be (thus
bringing the conversation full circle). [1]




Leaving aside religious issues I just want to be  sure we're clear on one 
point: the work required for the info URI process is exactly the amount of 
work required, no more no less.  It forces you to specify clear syntax and 
semantics, normalization (if applicable), etc.  If you go a different route 
because it's less work, then you're probably avoiding doing work that needs 
to be done.


--Ray 
  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
So is there anything wrong with having both that http-based PURL URI 
available, AND an info uri? Not only available, but in common use?


It gets complicated thinking about these things. There are potentially 
several things wrong with it.


Jonathan

Ross Singer wrote:

On Mon, Mar 30, 2009 at 10:12 AM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
  

Leaving aside religious issues I just want to be  sure we're clear on one
point: the work required for the info URI process is exactly the amount of
work required, no more no less.  It forces you to specify clear syntax and
semantics, normalization (if applicable), etc.  If you go a different route
because it's less work, then you're probably avoiding doing work that needs
to be done.



Avoiding the religious debate that I *think* Ray is referring to (http
vs. info URIs) and instead raising a different religious debate...

I don't have a problem with going through this process to formalize an
info URI once a domain has been thoroughly evaluated and worked out,
but it throws any and all sense of 'agility' out the window and in
many cases, kills any potential hope of actually seeing these
identifiers at all.  The upfront costs are just too high, the details
too arcane and the payoff too low for somebody like Jonathan to solve
an immediate problem.

I'm not saying we shouldn't think these things out beforehand;
recklessness, of course, is not the answer.  Perfection, however,
being the enemy of the good makes me think the info:uri process isn't
a particularly good or efficient one for working with real world
problems.

Add to it that nobody gives a damn about info:uris outside of
libraries, it seems like a total waste of energy.

Although I suppose that strays back into the original religious debate.

-Ross.

  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Mike Taylor
Jonathan Rochkind writes:
  So is there anything wrong with having both that http-based PURL URI 
  available, AND an info uri? Not only available, but in common use?

Yes, of course!  You don't want _two_ vocabularies of URIs for SUDOCs!

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  I am not so much afraid of death, as ashamed thereof -- Sir
 Thomas Browne (1605-1682), English physician and author.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ross Singer
There should be no issue with having both, mainly because like I
mentioned earlier, nobody cares about info:uris.

Take, for instance, DOIs.  What do you see in the wild?  Do you ever
see info:uris (except in OpenURLs)?  If you don't see
http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
like having http and info URIs would *have* to be fine, since
info:uris *not being dereferenceable* are far less useful (I won't go
so far as 'useless') on the web, which is where all this is happening.

As Ray mentioned earlier in this thread, there is absolutely no reason
an object cannot have multiple identifiers, especially if they stand
to serve somewhat different purposes.

I guess the way I look at it is:

1.  The web is not going to wait for info:uris
2.  The web is not going to use info:uris anyway, even after we've
exhausted all of the corner cases and come up with the perfect URI
model for a given domain, *because there's nothing the web can do with
them anyway*.

-Ross.

On Mon, Mar 30, 2009 at 10:55 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 So is there anything wrong with having both that http-based PURL URI
 available, AND an info uri? Not only available, but in common use?

 It gets complicated thinking about these things. There are potentially
 several things wrong with it.

 Jonathan

 Ross Singer wrote:

 On Mon, Mar 30, 2009 at 10:12 AM, Ray Denenberg, Library of Congress
 r...@loc.gov wrote:


 Leaving aside religious issues I just want to be  sure we're clear on one
 point: the work required for the info URI process is exactly the amount
 of
 work required, no more no less.  It forces you to specify clear syntax
 and
 semantics, normalization (if applicable), etc.  If you go a different
 route
 because it's less work, then you're probably avoiding doing work that
 needs
 to be done.


 Avoiding the religious debate that I *think* Ray is referring to (http
 vs. info URIs) and instead raising a different religious debate...

 I don't have a problem with going through this process to formalize an
 info URI once a domain has been thoroughly evaluated and worked out,
 but it throws any and all sense of 'agility' out the window and in
 many cases, kills any potential hope of actually seeing these
 identifiers at all.  The upfront costs are just too high, the details
 too arcane and the payoff too low for somebody like Jonathan to solve
 an immediate problem.

 I'm not saying we shouldn't think these things out beforehand;
 recklessness, of course, is not the answer.  Perfection, however,
 being the enemy of the good makes me think the info:uri process isn't
 a particularly good or efficient one for working with real world
 problems.

 Add to it that nobody gives a damn about info:uris outside of
 libraries, it seems like a total waste of energy.

 Although I suppose that strays back into the original religious debate.

 -Ross.





Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Rob Sanderson
On Mon, 2009-03-30 at 16:08 +0100, Ross Singer wrote:
 There should be no issue with having both, mainly because like I
 mentioned earlier, nobody cares about info:uris.

s/nobody cares/the web doesn't care/

'The Web' isn't the only use case.  There are plenty of reasons for
having non dereferencable identifiers, for example for things which do
not have a web representation, or have too many web representations to
make favouring one over another a waste of time. For example abstract
concepts.

 I guess the way I look at it is:
 1.  The web is not going to wait for info:uris
 2.  The web is not going to use info:uris anyway, even after we've
 exhausted all of the corner cases and come up with the perfect URI
 model for a given domain, *because there's nothing the web can do with
 them anyway*.

Working As Intended.

If you want an identifier that *explicitly* cannot be dereferenced, then
info URIs are a good choice.  If you want one that can be dereferenced
to some representation of the identified object, then HTTP is the only
choice.

Rob


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ross Singer
On Mon, Mar 30, 2009 at 11:18 AM, Ray Denenberg, Library of Congress
r...@loc.gov wrote:
 Nor do people outside of libraries care about identifiers.

Except, of course, for Tim Berners-Lee and anybody who listens to him:
http://www.w3.org/DesignIssues/LinkedData.html

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

nobody gives a damn about info:uris outside of
libraries, 


Nor do people outside of libraries care about identifiers. 


--Ray


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Mike Taylor
Ross Singer writes:
  There should be no issue with having both, mainly because like I
  mentioned earlier, nobody cares about info:uris.
  
  Take, for instance, DOIs.  What do you see in the wild?  Do you ever
  see info:uris (except in OpenURLs)?  If you don't see
  http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
  like having http and info URIs would *have* to be fine, since
  info:uris *not being dereferenceable* are far less useful (I won't go
  so far as 'useless') on the web, which is where all this is happening.

What on earth does dereferencing have to do with this?

We're talking about an identifier.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  You can never go back -- only forwards, or stand still.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ross Singer
On Mon, Mar 30, 2009 at 11:17 AM, Rob Sanderson azar...@liverpool.ac.uk wrote:

 If you want an identifier that *explicitly* cannot be dereferenced, then
 info URIs are a good choice.  If you want one that can be dereferenced
 to some representation of the identified object, then HTTP is the only
 choice.

Yes, I completely agree with this, which is why I think it *has* to be
no problem that both info:uris and http uris can co-exist.

I'm not entirely sure of the use case of identifiers that cannot be
derefenced, I mean, I'm sure they exist (driver's license numbers,
might be an example), but I don't see anything in the current info:uri
registry wouldn't necessarily be better served with an HTTP uri.

-Ross.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
Because the ability to de-reference seems to be the main reason to use 
an HTTP URI as an identifier, and the main reason that some people 
prefer an HTTP URI as an identifier to an info: URI.


Jonathan

Mike Taylor wrote:

Ross Singer writes:
  There should be no issue with having both, mainly because like I
  mentioned earlier, nobody cares about info:uris.
  
  Take, for instance, DOIs.  What do you see in the wild?  Do you ever

  see info:uris (except in OpenURLs)?  If you don't see
  http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
  like having http and info URIs would *have* to be fine, since
  info:uris *not being dereferenceable* are far less useful (I won't go
  so far as 'useless') on the web, which is where all this is happening.

What on earth does dereferencing have to do with this?

We're talking about an identifier.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  You can never go back -- only forwards, or stand still.

  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Mike Taylor
Jonathan Rochkind writes:
 Take, for instance, DOIs.  What do you see in the wild?  Do you ever
 see info:uris (except in OpenURLs)?  If you don't see
 http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
 like having http and info URIs would *have* to be fine, since
 info:uris *not being dereferenceable* are far less useful (I won't go
 so far as 'useless') on the web, which is where all this is happening.
  
   What on earth does dereferencing have to do with this?
  
   We're talking about an identifier.
 
  Because the ability to de-reference seems to be the main reason to use 
  an HTTP URI as an identifier, and the main reason that some people 
  prefer an HTTP URI as an identifier to an info: URI.

That looks like a plain and simple confusion to me.  Identifiers and
addresses are two quite different things.  That they happen to be
expressed in similar or even identical syntax is an accident of
history.  Surely our experiences with XML namespaces (which do not
exist) have taught us that?

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Our users will know fear and cower before our software!  Ship it!
 Ship it and let them flee like the dogs they are! -- Klingon
 Programming Mantra


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
This is a long argument that's been going on in other communities for a 
long time, Mike.  I can see both sides.


Jonathan

Mike Taylor wrote:

Jonathan Rochkind writes:
 Take, for instance, DOIs.  What do you see in the wild?  Do you ever
 see info:uris (except in OpenURLs)?  If you don't see
 http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
 like having http and info URIs would *have* to be fine, since
 info:uris *not being dereferenceable* are far less useful (I won't go
 so far as 'useless') on the web, which is where all this is happening.
  
   What on earth does dereferencing have to do with this?
  
   We're talking about an identifier.
 
  Because the ability to de-reference seems to be the main reason to use 
  an HTTP URI as an identifier, and the main reason that some people 
  prefer an HTTP URI as an identifier to an info: URI.


That looks like a plain and simple confusion to me.  Identifiers and
addresses are two quite different things.  That they happen to be
expressed in similar or even identical syntax is an accident of
history.  Surely our experiences with XML namespaces (which do not
exist) have taught us that?

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Our users will know fear and cower before our software!  Ship it!
 Ship it and let them flee like the dogs they are! -- Klingon
 Programming Mantra

  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Mike Taylor
Houghton,Andrew writes:
 Take, for instance, DOIs.  What do you see in the wild?  Do
 you ever see info:uris (except in OpenURLs)?  If you don't see
 http://dx.doi.org/ URIs you generally see doi:10... URIs.  It
 seems like having http and info URIs would *have* to be fine,
 since info:uris *not being dereferenceable* are far less
 useful (I won't go so far as 'useless') on the web, which is
 where all this is happening.
   
   What on earth does dereferencing have to do with this?
   
   We're talking about an identifier.
  
  Exactly, that is what people don't understand about RFC 3986.  URIs
  are just identifiers and have nothing to do with dereferencing.
  Dereferencing only comes into play when the URI is used with an
  actual protocol like HTTP.  The only thing the http:, e.g., URI
  scheme, starting the URI tells you is what the syntax of the rest
  of the URI looks like.  This is where the authors of info URIs
  missed the boat.  They conflated the URI scheme, e.g., http:, with
  dereferencing and used it as a justification for a new URI scheme.
  The authors were told of that misconception before info became an
  RFC by both the IETF and W3C [...]

... and by me, for what's it's worth (remember, Ray? :-)) ...

  [...], but they decided to proceed anyway creating another library
  specific standard that no one else will use.
  
  If people would just follow the prescribed practice by the W3C:
  
  http://www.w3.org/TR/webarch/ 
  Architecture of the Web says:
  
  2.3.1. URI aliases
  
  Best practice: A URI owner SHOULD NOT associate arbitrarily
  different URIs with the same resource.
  
  2.4. URI Schemes
  
  Best practice: A specification SHOULD reuse an existing URI scheme
  (rather than create a new one) when it provides the desired
  properties of identifiers and their relation to resources.

True -- it's all there.

The problem is that, after setting up a non-dereferencable http: URI
to name something like an XML namespace or a CQL context set, it's
just so darned _tempting_ to put something explanatory at the location
which happens to be indicated by that URI  :-)

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  You can also join us online at www.msnbc.com.  You know,
 I'm always afraid I'm going to say too many Ws. -- NBC news
 anchorman Tom Brokaw.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
Meanwhile, there are others who are arguing just as strongly that 
identifiers should _always_ be resolvable.


Seriously, this debate has been going on in a while in other forums, we 
aren't the first to have it. I can see both sides, neither seems 
obviously right to me.  Which I guess suggests that we need room for 
both resolvable identifiers and non-resolvable identifiers. (And then 
people will start arguing on whether http uri's provide all the room we 
need for non-resolvable ones or not. That argument has been had before 
too, and I see both sides there too!)


Some hints of the existing argument in other forums can be found in this 
post by Stu Weibel, and the other posts it links to.


http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html

Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Mike Taylor
Sent: Monday, March 30, 2009 11:30 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

Ross Singer writes:
  There should be no issue with having both, mainly because like I
  mentioned earlier, nobody cares about info:uris.
 
  Take, for instance, DOIs.  What do you see in the wild?  Do you ever
  see info:uris (except in OpenURLs)?  If you don't see
  http://dx.doi.org/ URIs you generally see doi:10... URIs.  It seems
  like having http and info URIs would *have* to be fine, since
  info:uris *not being dereferenceable* are far less useful (I won't
go
  so far as 'useless') on the web, which is where all this is
happening.

What on earth does dereferencing have to do with this?

We're talking about an identifier.



Exactly, that is what people don't understand about RFC 3986.  URIs are
just identifiers and have nothing to do with dereferencing.  Dereferencing
only comes into play when the URI is used with an actual protocol like 
HTTP.  The only thing the http:, e.g., URI scheme, starting the URI tells 
you is what the syntax of the rest of the URI looks like.  This is where 
the authors of info URIs missed the boat.  They conflated the URI scheme,

e.g., http:, with dereferencing and used it as a justification for a new
URI scheme.  The authors were told of that misconception before info
became an RFC by both the IETF and W3C, but they decided to proceed 
anyway creating another library specific standard that no one else will

use.

If people would just follow the prescribed practice by the W3C:

http://www.w3.org/TR/webarch/ 
Architecture of the Web says:


2.3.1. URI aliases

Best practice: A URI owner SHOULD NOT associate arbitrarily different URIs with the 
same resource.

2.4. URI Schemes

Best practice: A specification SHOULD reuse an existing URI scheme (rather than 
create a new one) when it provides the desired properties of identifiers and their 
relation to resources.

Quote: While Web architecture allows the definition of new schemes, introducing a 
new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large 
amount of deployed software already processes URIs of well-known schemes. Introducing a 
new URI scheme requires the development and deployment not only of client software to 
handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. 
See [RFC2718] for other considerations and costs related to URI scheme design.

http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 
This tag finding pretty much debunks all the reasons given by the info URI authors for creating a new URI scheme.  I think Erik Hetzner also referenced it in his posts.



Andy.

  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Monday, March 30, 2009 12:15 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 The problem is that, after setting up a non-dereferencable http: URI
 to name something like an XML namespace or a CQL context set, it's
 just so darned _tempting_ to put something explanatory at the location
 which happens to be indicated by that URI  :-)

and that is what you are suppose to do...  Having a representation of
the thing is useful and is what makes the Web and any other hypertext
system useful.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Monday, March 30, 2009 12:16 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Some hints of the existing argument in other forums can be found in
 this
 post by Stu Weibel, and the other posts it links to.
 
 http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html

Unfortunately, Stu is an author of the info URI specification and the
he makes the same arguments that they made for the justification of
the info URI RFC which has been debunked by the W3C:

http://www.w3.org/2001/tag/doc/URNsAndRegistries-50

Having unresolvable URIs is anti-Web since the Web is a hypertext
system where links are required to make it useful.  Exposing
unresolvable links in content on the Web doesn't make the Web 
more useful.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 10:12:39 -0400,
Ray Denenberg, Library of Congress wrote:
 Leaving aside religious issues I just want to be  sure we're clear on one
 point: the work required for the info URI process is exactly the amount of
 work required, no more no less.  It forces you to specify clear syntax and
 semantics, normalization (if applicable), etc.  If you go a different route
 because it's less work, then you're probably avoiding doing work that needs
 to be done.

Reading over your previous message regarding mapping SuDocs syntax to
URI syntax, I completely agree about the necessity of clarifying these
rules.

But I was referring to the bureaucratic overhead (little thought it
may be) in registering an info: URI. This overhead may or may not be
useful, but it is there, including a submission process, internal
review,  public comments (according the draft info URI registry
policy).

-Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpz1Vry1WFt3.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ross Singer
I agree with this as well.  I guess it just depends on whether you
think this needs to be done prior to facitating the process to mint
URIs or after.

The advantage to the former is that it will actually get documented.

Speaking of, if anybody wants to help formalize this for the purl
method, I'll be happy to work on it with somebody.

-Ross.

On Mon, Mar 30, 2009 at 1:40 PM, Erik Hetzner erik.hetz...@ucop.edu wrote:
 At Mon, 30 Mar 2009 10:12:39 -0400,
 Ray Denenberg, Library of Congress wrote:
 Leaving aside religious issues I just want to be  sure we're clear on one
 point: the work required for the info URI process is exactly the amount of
 work required, no more no less.  It forces you to specify clear syntax and
 semantics, normalization (if applicable), etc.  If you go a different route
 because it's less work, then you're probably avoiding doing work that needs
 to be done.

 Reading over your previous message regarding mapping SuDocs syntax to
 URI syntax, I completely agree about the necessity of clarifying these
 rules.

 But I was referring to the bureaucratic overhead (little thought it
 may be) in registering an info: URI. This overhead may or may not be
 useful, but it is there, including a submission process, internal
 review,  public comments (according the draft info URI registry
 policy).

 -Erik

 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3




Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
It's interesting that there are at least three, if not four, viewpoints 
being represented in this conversation.


The first argument is over whether all identifiers should be resolvable 
or not.  While I respect the argument that it's _useful_ to have 
resolvable (to something) identifiers , I think it's an unneccesary 
limitation to say that all identifiers _must_ be resolvable. There are 
cases where it is infeasible on a business level to support 
resolvability.  It may be for as simple a reason as that the body who 
actually maintains the identifiers is not interested in providing such 
at present.  You can argue that they _ought_ to be, but back in the real 
world, should that stand as a barrier to anyone else using URI 
identifiers based on that particular identifier system?  Wouldn't it be 
better if it didn't have to be?


[ Another obvious example is the SICI -- an identifier for a particular 
article in a serial. Making these all resolvable in a useful way is a 
VERY non-trivial exersize. It is not at all easy, and a solution is 
definitely not cheap (DOI is an attempted solution; which some 
publishers choose not to pay for; both the DOI fees and the cost of 
building out their own infrastructure to support it). Why should we be 
prevented from using identifiers for a particular article in a serial 
until this difficult and expensive problem is solved?]


So I don't buy that all identifiers must always be resolvable, and that 
if we can't make an identifier resolvable we can't use it. That excludes 
too much useful stuff.


The next argument is, okay, so many all identifiers don't have to be 
resolvable, but even  if it's not resolvable you can still use an http 
uri for it, just one that doesn't actually resolve.  Formally, this is 
certainly correct. There's no formal requirement that an http URI go 
anywhere, that there even be an HTTP server responding at the hostname 
mentioned _at all_. So you _could_ use an http uri like that.   But it 
gets confusing quickly, in part because the first argument referenced is 
still going on, and some people assume that any http URI _ought_ to be 
resolvable (to _something_; to _what_ is another argument).  Using a 
non-http uri is a way to avoid confusion over your intentions, stating 
that you acknolwedged from the start that it was infeasible at the 
present time to provide http resolution for these identifiers.


Jonathan

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Monday, March 30, 2009 12:16 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

Some hints of the existing argument in other forums can be found in
this
post by Stu Weibel, and the other posts it links to.

http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html



Unfortunately, Stu is an author of the info URI specification and the
he makes the same arguments that they made for the justification of
the info URI RFC which has been debunked by the W3C:

http://www.w3.org/2001/tag/doc/URNsAndRegistries-50

Having unresolvable URIs is anti-Web since the Web is a hypertext
system where links are required to make it useful.  Exposing
unresolvable links in content on the Web doesn't make the Web 
more useful.



Andy.

  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Hilmar Lapp

On Mar 30, 2009, at 11:18 AM, Ray Denenberg, Library of Congress wrote:


From: Ross Singer rossfsin...@gmail.com

nobody gives a damn about info:uris outside of
libraries,


Nor do people outside of libraries care about identifiers.


You might be surprised: http://www.lsrn.org/

-hilmar
--
===
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 13:58:04 -0400,
Jonathan Rochkind wrote:
 
 It's interesting that there are at least three, if not four, viewpoints 
 being represented in this conversation.
 
 The first argument is over whether all identifiers should be resolvable 
 or not.  While I respect the argument that it's _useful_ to have 
 resolvable (to something) identifiers , I think it's an unneccesary 
 limitation to say that all identifiers _must_ be resolvable. There are 
 cases where it is infeasible on a business level to support 
 resolvability.  It may be for as simple a reason as that the body who 
 actually maintains the identifiers is not interested in providing such 
 at present.  You can argue that they _ought_ to be, but back in the real 
 world, should that stand as a barrier to anyone else using URI 
 identifiers based on that particular identifier system?  Wouldn't it be 
 better if it didn't have to be?

 [ Another obvious example is the SICI -- an identifier for a particular 
 article in a serial. Making these all resolvable in a useful way is a 
 VERY non-trivial exersize. It is not at all easy, and a solution is 
 definitely not cheap (DOI is an attempted solution; which some 
 publishers choose not to pay for; both the DOI fees and the cost of 
 building out their own infrastructure to support it). Why should we be 
 prevented from using identifiers for a particular article in a serial 
 until this difficult and expensive problem is solved?]
 
 So I don't buy that all identifiers must always be resolvable, and that 
 if we can't make an identifier resolvable we can't use it. That excludes 
 too much useful stuff.

I don’t actually think that there is anybody who is arguing that all
identifiers must be resolvable. There are people who argue that there
are identifiers which must NOT be resolvable; at least in their basic
form. (see Stuart Weibel [1]).
 
 […]

best,
Erik

1. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpuKdGTC0Mj7.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Ray Denenberg, Library of Congress

From: Hilmar Lapp hl...@duke.edu

Nor do people outside of libraries care about identifiers.


You might be surprised: http://www.lsrn.org/


yes,  I overstated, let me rephrase. There are communities who are 
interested in specific object classes and want identifier schemes for them. 
For libraries there are books, article, journals, and many others. And 
certainly this isn't limited to libraries, for example many scientific 
disciplines have a similar interest in identifer schemes for objects in 
specific object classes.


But the term  identifier has taken on a whole new meaning with the web. 
It has now been generalized to identify any resouce, and we don't even 
have a clear  definition of resource, aside from the convoluted anything 
that can be identified -  The discussions on this are often a convoluted 
mess, and  it's no wonder location and identity get confused.  And because 
of all the emphasis on solving this part of  the web architecture -  which 
haven't been accomplished, and there is debate within the W3C whether it is 
even possible - the original concept of identifer seems to be lost, aside 
from within the communities I alluded to above. And it is for those 
communities that the info URI is useful.


Now as to my reference to religious issues,  a statement like Having 
unresolvable URIs is anti-Web would be better to stated as: Having 
unresolvable URIs IN MY OPINION is anti-Web.  It is an opinion, not a fact. 
Stating is as fact is dogmatic.  It is a reasonable opinion, however, my 
opinion: Having unresolvable URIs IN MY OPINION is PRO-Web is just as 
reasonable.   I needn't go into further detail, we've beaten this to death 
already.


--Ray


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind

Erik Hetzner wrote:


I don’t actually think that there is anybody who is arguing that all
identifiers must be resolvable. There are people who argue that there
are identifiers which must NOT be resolvable; at least in their basic
form. (see Stuart Weibel [1]).
  


There are indeed people arguing that, Erik, on this very list. Like, in 
the email I responded to (did you read that one?).  That's why I wrote 
what I did, man! You know I'm the one who cited Stu's argument first on 
this list! I am aware of his arguments. I am aware of people arguing 
various things on this issue.


But when did someone suggest that all identifiers must be resolvable? 
When Andrew argued that:



Having unresolvable URIs is anti-Web since the Web is a hypertext
system where links are required to make it useful.  Exposing
unresolvable links in content on the Web doesn't make the Web 
more useful.



Okay, I guess he didn't actually SAY that you should never have non-resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. 

But now we're arguing about what we're arguing about, which is the sure sign that an internet argument should die. 


Suffice it to say that there are at LEAST three viewpoints (if not more) being expressed 
in this argument, it's not just two sides.  And that, I agree with Ray, these are NOT 
entirely solved questions, the right answer is not always obvious, reasonable 
people can disagree. (I happen to think there are a handful of clear WRONG answers, but 
also a variety of competing potentially right ones.)




Jonathan


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Monday, March 30, 2009 3:52 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 But when did someone suggest that all identifiers must be resolvable?
 When Andrew argued that:
 
  Having unresolvable URIs is anti-Web since the Web is a hypertext
  system where links are required to make it useful.  Exposing
  unresolvable links in content on the Web doesn't make the Web
  more useful.
 
 Okay, I guess he didn't actually SAY that you should never have non-
 resolvable identifiers, but he rather strongly implied it, by using the
 anti-Web epithet.

You are correct that I didn't say that you should never have unresolvable
identifiers and I wasn't implying that either.  Though I was pointing out
that sticking a href=info:lccn/sh2009123456Text/a into the hypertext
system where info URIs are unresolvable negates the effect of linking to it
in the first place.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Jonathan Rochkind
There are obviously other uses for URIs than sticking them in an 'href' 
attribute of an a. Like, the uses I thought this conversation was about?


What are we talking about again?

Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Monday, March 30, 2009 3:52 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

But when did someone suggest that all identifiers must be resolvable?
When Andrew argued that:



Having unresolvable URIs is anti-Web since the Web is a hypertext
system where links are required to make it useful.  Exposing
unresolvable links in content on the Web doesn't make the Web
more useful.
  

Okay, I guess he didn't actually SAY that you should never have non-
resolvable identifiers, but he rather strongly implied it, by using the
anti-Web epithet.



You are correct that I didn't say that you should never have unresolvable
identifiers and I wasn't implying that either.  Though I was pointing out
that sticking a href=info:lccn/sh2009123456Text/a into the hypertext
system where info URIs are unresolvable negates the effect of linking to it
in the first place.


Andy.
  


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 15:52:10 -0400,
Jonathan Rochkind wrote:
 
 Erik Hetzner wrote:
 
  I don’t actually think that there is anybody who is arguing that all
  identifiers must be resolvable. There are people who argue that there
  are identifiers which must NOT be resolvable; at least in their basic
  form. (see Stuart Weibel [1]).
 
 There are indeed people arguing that, Erik, on this very list. Like,
 in the email I responded to (did you read that one?). That's why I
 wrote what I did, man! You know I'm the one who cited Stu's argument
 first on this list! I am aware of his arguments. I am aware of
 people arguing various things on this issue.

My apologies for missing Andrew’s argument and not pointing out that
you had originally pointed to Stuart’s argument.
 
 But when did someone suggest that all identifiers must be resolvable? 
 When Andrew argued that:
 
  Having unresolvable URIs is anti-Web since the Web is a hypertext
  system where links are required to make it useful.  Exposing
  unresolvable links in content on the Web doesn't make the Web 
  more useful.

 Okay, I guess he didn't actually SAY that you should never have
 non-resolvable identifiers, but he rather strongly implied it, by
 using the anti-Web epithet.

Given Andrew’s later response, I would like to restate my previous
argument:

I don’t [] think that there is anybody who is +seriously+ arguing that
all identifiers must be resolvable +to be useful as identifiers+.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgps01lTF1mj0.pgp
Description: PGP signature


[CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind

Does anyone know the process for registering a sub-scheme for info: uris?

I'd like to have one for SuDoc classification numbers, info:sudoc/.

I'm not sure if I can register that on my own, without working with the 
US Government Printing Office, who actually maintains sudocs.  But if I 
have to get GPO to do it, I'll probably give up quicker (unless it turns 
out easier than I thought to find the right person at GPO and get them 
to sign on -- I doubt it!). Or if the registration process is really 
long and onerous.


But if it's easy enough to just fill out a form and get info:sudoc 
registered, I'd rather it be legal than use things that look like an 
info uri but really aren't a legally registered sub-scheme.


Anyone know?

Jonathan


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind

Thanks Ray.

Oh boy, I don't know enough about SuDoc to describe the syntax rules 
fully. I can spend some more time with the SuDoc documentation (written 
for a pre-computer era) and try to figure it out, or do the best I can.  
I mean, the info registration can clearly point to the existing SuDoc 
documentation and say one of these -- but actually describing the 
syntax formally may or may not be possible/easy/possible-for-me-personally.


I can't even tell if normalization would be required or not. I don't 
think so.  I think SuDocs don't suffer from that problem LCCNs did to 
require normalization, I think they already have consistent form,  but 
I'm not certain.


I'll see what I can do with it. 

But Ray, you work for 'the government'.   Do you have a relationship 
with a counter-part at GPO that might be interested in getting involved 
with this?


Jonathan

Ray Denenberg, Library of Congress wrote:

It's a fairly straightforward process,  See:
http://info-uri.info/registry/register.html

You should look at a few examples first, go to 
http://info-uri.info/registry/  and click on a few of those listed in the 
left column.


I think registering one for SuDocs would be fairly easy.

The info folks are most concerned that the syntax rules are well-described. 
I had registered a few of these before they started cracking the whip on 
that (and rightly so), and when I registered info:lc it became more 
difficult; you might want to look at that for an example:

http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/

Also, normalization - I suggested looking at info:lccn normalization rules:
http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/

--Ray


- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:12 PM
Subject: [CODE4LIB] registering info: uris?


  

Does anyone know the process for registering a sub-scheme for info: uris?

I'd like to have one for SuDoc classification numbers, info:sudoc/.

I'm not sure if I can register that on my own, without working with the US 
Government Printing Office, who actually maintains sudocs.  But if I have 
to get GPO to do it, I'll probably give up quicker (unless it turns out 
easier than I thought to find the right person at GPO and get them to sign 
on -- I doubt it!). Or if the registration process is really long and 
onerous.


But if it's easy enough to just fill out a form and get info:sudoc 
registered, I'd rather it be legal than use things that look like an info 
uri but really aren't a legally registered sub-scheme.


Anyone know?

Jonathan 



  


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Ray Denenberg, Library of Congress
Pointing to the documentation and saying one of these isn't going to work, 
I'm afraid.   Most important is to make sure that the syntax is consistent 
with URI syntax.  Where the syntax of the identifier you're representing is 
potentially at odds with URI syntax, you  might have to make adjustments, 
like percent-encode. So if you're going to register sudoc, you're going to 
have to understand the syntax to some degree, there's really no way around 
it. (I didn't know the lccn syntax, registering it forced me to learn it, 
and I'm a better man for it.)


I don't know much about SuDoc, and most everything seems to point to 
http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really 
explain their syntax. (Though if you look a bit harder maybe you'll find 
something better.)


But I see this example:Y 3.C 76/3:2 K 54

That's apparently a sudoc.  It immediately raises the following flags: 
spaces, slash, colon, and case (sensitivity).For your purposes I don't 
think that colon or slash is a problem. (They become a problem when you are 
using them as special characters for delimitation, but you're not doing 
that.) Spaces, though, have to be percent encoded. (That simply means 
replace each occurence of a space with %20.)


You also need to look at case-sensitivity. If sudocs are case-sensitive, no 
problem, if not, then you may want to normalize to either upper or lower 
case.


There may not be any normalization issues (other than case sensitivity, if 
that).   Normalization is an issue only if a particular sudoc can be 
represented by more than one string.   If so you have two choices:

1. prescribe a canonical form (which is the approach we took for LCCNs).
2.  simply describe the rules for determining when two strings represent the 
same sudoc (there is no rule that says that two different info URIs can't 
refer to the same resource).


You can contact me privately if you have problems.

No, sorry, I don't know anyone at GPO.  I worked the graveyard shift there 
part time during college.  (I had to load mailing machines with junk mail. 
Several junk items loaded into a machine which would combine them into one 
mailing item. The machine would jam about every tenth time. Worst job I ever 
had.) But that was many years ago and that's the last contact I've had with 
GPO.


Good luck.

-Ray

- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:36 PM
Subject: Re: [CODE4LIB] registering info: uris?



Thanks Ray.

Oh boy, I don't know enough about SuDoc to describe the syntax rules 
fully. I can spend some more time with the SuDoc documentation (written 
for a pre-computer era) and try to figure it out, or do the best I can.  I 
mean, the info registration can clearly point to the existing SuDoc 
documentation and say one of these -- but actually describing the syntax 
formally may or may not be possible/easy/possible-for-me-personally.


I can't even tell if normalization would be required or not. I don't think 
so.  I think SuDocs don't suffer from that problem LCCNs did to require 
normalization, I think they already have consistent form,  but I'm not 
certain.


I'll see what I can do with it.
But Ray, you work for 'the government'.   Do you have a relationship with 
a counter-part at GPO that might be interested in getting involved with 
this?


Jonathan

Ray Denenberg, Library of Congress wrote:

It's a fairly straightforward process,  See:
http://info-uri.info/registry/register.html

You should look at a few examples first, go to 
http://info-uri.info/registry/  and click on a few of those listed in the 
left column.


I think registering one for SuDocs would be fairly easy.

The info folks are most concerned that the syntax rules are 
well-described. I had registered a few of these before they started 
cracking the whip on that (and rightly so), and when I registered info:lc 
it became more difficult; you might want to look at that for an example:

http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lc/

Also, normalization - I suggested looking at info:lccn normalization 
rules:

http://info-uri.info/registry/OAIHandler?verb=GetRecordmetadataPrefix=regidentifier=info:lccn/

--Ray


- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:12 PM
Subject: [CODE4LIB] registering info: uris?



Does anyone know the process for registering a sub-scheme for info: 
uris?


I'd like to have one for SuDoc classification numbers, info:sudoc/.

I'm not sure if I can register that on my own, without working with the 
US Government Printing Office, who actually maintains sudocs.  But if I 
have to get GPO to do it, I'll probably give up quicker (unless it turns 
out easier than I thought to find the right person at GPO and get them 
to sign on -- I doubt it!). Or if the registration process is really 
long

Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Erik Hetzner
At Fri, 27 Mar 2009 15:36:43 -0400,
Jonathan Rochkind wrote:
 
 Thanks Ray.
 
 Oh boy, I don't know enough about SuDoc to describe the syntax rules 
 fully. I can spend some more time with the SuDoc documentation (written 
 for a pre-computer era) and try to figure it out, or do the best I can.  
 I mean, the info registration can clearly point to the existing SuDoc 
 documentation and say one of these -- but actually describing the 
 syntax formally may or may not be possible/easy/possible-for-me-personally.
 
 I can't even tell if normalization would be required or not. I don't 
 think so.  I think SuDocs don't suffer from that problem LCCNs did to 
 require normalization, I think they already have consistent form,  but 
 I'm not certain.
 
 I'll see what I can do with it. 
 
 But Ray, you work for 'the government'.   Do you have a relationship 
 with a counter-part at GPO that might be interested in getting involved 
 with this?

Hi Jonathan -

Obviously I don’t know your requirements, but I’d like to suggest that
before going down the info: URI road, you read the W3C Technical
Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1].

| Abstract

| This finding addresses the questions When should URNs or URIs with
| novel URI schemes be used to name information resources for the
| Web? and Should registries be provided for such identifiers?. The
| answers given are Rarely if ever and Probably not. Common
| arguments in favor of such novel naming schemas are examined, and
| their properties compared with those of the existing http: URI
| scheme.

| Three case studies are then presented, illustrating how the http:
| URI scheme can be used to achieve many of the stated requirements
| for new URI schemes.

best,
Erik Hetzner

1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpvBsZoxJDPh.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind
Yeah, I thought of the URI encoding issue, that's easy enough to deal 
with, makes sense.


I have no idea how to tell if SuDocs are case sensitive or not. But they 
ARE all assigned by the GPO, and look-up-able in the GPO catalog.  Yeah, 
they have to be URL encoded, certainly, but can't we just say must be a 
valid SuDoc class (including book number) assigned by the GPO, but [url 
encode it].  This can't be the only use case for essentially arbitrary 
strings assigned by a third party controlling authority, that you want 
to make into an info: uri, right?  

But maybe I'll try doing the best I can, with or without GPO assistance 
(Ed Summers said he thought he might know somebody at GPO interested in 
identifiers), and maybe run it by you? 

If this ends up being a huge time sink -- I'm probably going to give up, 
and just use my own illegal info:sudoc identifiers that aren't really 
registered at all, which would be bad, but I need a sudoc URI and don't 
have a huge amount of time to sink into doing it 'right'.


Believe me, I have already spent quite a bit of time with that document 
you reference. It was written for an earlier era, clearly.


Jonathan

Ray Denenberg, Library of Congress wrote:
Pointing to the documentation and saying one of these isn't going to work, 
I'm afraid.   Most important is to make sure that the syntax is consistent 
with URI syntax.  Where the syntax of the identifier you're representing is 
potentially at odds with URI syntax, you  might have to make adjustments, 
like percent-encode. So if you're going to register sudoc, you're going to 
have to understand the syntax to some degree, there's really no way around 
it. (I didn't know the lccn syntax, registering it forced me to learn it, 
and I'm a better man for it.)


I don't know much about SuDoc, and most everything seems to point to 
http://www.gpo.gov/su_docs/fdlp/pubs/explain.html which doesn't really 
explain their syntax. (Though if you look a bit harder maybe you'll find 
something better.)


But I see this example:Y 3.C 76/3:2 K 54

That's apparently a sudoc.  It immediately raises the following flags: 
spaces, slash, colon, and case (sensitivity).For your purposes I don't 
think that colon or slash is a problem. (They become a problem when you are 
using them as special characters for delimitation, but you're not doing 
that.) Spaces, though, have to be percent encoded. (That simply means 
replace each occurence of a space with %20.)


You also need to look at case-sensitivity. If sudocs are case-sensitive, no 
problem, if not, then you may want to normalize to either upper or lower 
case.


There may not be any normalization issues (other than case sensitivity, if 
that).   Normalization is an issue only if a particular sudoc can be 
represented by more than one string.   If so you have two choices:

1. prescribe a canonical form (which is the approach we took for LCCNs).
2.  simply describe the rules for determining when two strings represent the 
same sudoc (there is no rule that says that two different info URIs can't 
refer to the same resource).


You can contact me privately if you have problems.

No, sorry, I don't know anyone at GPO.  I worked the graveyard shift there 
part time during college.  (I had to load mailing machines with junk mail. 
Several junk items loaded into a machine which would combine them into one 
mailing item. The machine would jam about every tenth time. Worst job I ever 
had.) But that was many years ago and that's the last contact I've had with 
GPO.


Good luck.

-Ray

- Original Message - 
From: Jonathan Rochkind rochk...@jhu.edu

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 3:36 PM
Subject: Re: [CODE4LIB] registering info: uris?


  

Thanks Ray.

Oh boy, I don't know enough about SuDoc to describe the syntax rules 
fully. I can spend some more time with the SuDoc documentation (written 
for a pre-computer era) and try to figure it out, or do the best I can.  I 
mean, the info registration can clearly point to the existing SuDoc 
documentation and say one of these -- but actually describing the syntax 
formally may or may not be possible/easy/possible-for-me-personally.


I can't even tell if normalization would be required or not. I don't think 
so.  I think SuDocs don't suffer from that problem LCCNs did to require 
normalization, I think they already have consistent form,  but I'm not 
certain.


I'll see what I can do with it.
But Ray, you work for 'the government'.   Do you have a relationship with 
a counter-part at GPO that might be interested in getting involved with 
this?


Jonathan

Ray Denenberg, Library of Congress wrote:


It's a fairly straightforward process,  See:
http://info-uri.info/registry/register.html

You should look at a few examples first, go to 
http://info-uri.info/registry/  and click on a few of those listed in the 
left column.


I think registering one for SuDocs would be fairly easy.

The info folks are most

Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind
I am looking for the easiest possible way to get a legal URI 
representing a sudoc.


My understanding, after looking at this stuff previously, is that info: 
is a LOT lower barrier than urn:, and that's part of it's purpose.


Before Ed or someone else mentions http, to me, using http: URIs would 
only make sense if the GPO were actually interested in supporting such 
in a persistent way. I don't really want to have to go down that road 
just to get a legal URI for a sudoc, but if someone else does, please 
feel free. :)


Jonathan

Erik Hetzner wrote:

At Fri, 27 Mar 2009 15:36:43 -0400,
Jonathan Rochkind wrote:
  

Thanks Ray.

Oh boy, I don't know enough about SuDoc to describe the syntax rules 
fully. I can spend some more time with the SuDoc documentation (written 
for a pre-computer era) and try to figure it out, or do the best I can.  
I mean, the info registration can clearly point to the existing SuDoc 
documentation and say one of these -- but actually describing the 
syntax formally may or may not be possible/easy/possible-for-me-personally.


I can't even tell if normalization would be required or not. I don't 
think so.  I think SuDocs don't suffer from that problem LCCNs did to 
require normalization, I think they already have consistent form,  but 
I'm not certain.


I'll see what I can do with it. 

But Ray, you work for 'the government'.   Do you have a relationship 
with a counter-part at GPO that might be interested in getting involved 
with this?



Hi Jonathan -

Obviously I don’t know your requirements, but I’d like to suggest that
before going down the info: URI road, you read the W3C Technical
Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1].

| Abstract

| This finding addresses the questions When should URNs or URIs with
| novel URI schemes be used to name information resources for the
| Web? and Should registries be provided for such identifiers?. The
| answers given are Rarely if ever and Probably not. Common
| arguments in favor of such novel naming schemas are examined, and
| their properties compared with those of the existing http: URI
| scheme.

| Three case studies are then presented, illustrating how the http:
| URI scheme can be used to achieve many of the stated requirements
| for new URI schemes.

best,
Erik Hetzner

1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
  



;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
  


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind
True, good point. I am looking for something a _bit_ more shareable 
between other software and institutions than tag. info: still seems a 
nice compromise to me.


Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Friday, March 27, 2009 4:42 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

I am looking for the easiest possible way to get a legal URI
representing a sudoc.

My understanding, after looking at this stuff previously, is that info:
is a LOT lower barrier than urn:, and that's part of it's purpose.



Jonathan you could use TAG URI's, RFC 4151, if you are looking for something
quick and dirty.  No need to register with any authority since you are using
your own DNS name.

http://tools.ietf.org/html/rfc4151


Andy.
  


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Jonathan Rochkind
Aha, cool!  Yeah, I could use tag for this, but it wouldn't seem 
appropriate for something I want to encourage others to use compatibly 
as well, info seems better.


Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Friday, March 27, 2009 4:52 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

Also, the date aspect of a tag-uri seems to make it hard to use to mint
an identifier that will always represent the same SuDoc, regardless of
when it was minted.



No the date part is a versioning scheme, not the date you created the
tag URI.  It's used, for example, where I created a specific tag
scheme one day and then decided to create another tag scheme some
other day:

tag:example.org,1999:date/yy-mm-dd

where yy-mm-dd is the year, month and day values.  Then I realize that
it's Y2K so I create a new tag scheme:

tag:example.org,2000:date/-mm-dd


Andy.
  


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Friday, March 27, 2009 5:00 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Aha, cool!  Yeah, I could use tag for this, but it wouldn't seem
 appropriate for something I want to encourage others to use compatibly
 as well, info seems better.

Not to push tag URIs on you, just providing some information,
but if you are working with other organizations, you could 
just go to GoDaddy and get a domain name for your project, 
then use an email address instead of ND.EDU:

tag:project-n...@my-tags.org,2009:id/sudoc-value


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Friday, March 27, 2009 5:28 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Another good idea, true. There are indeed lots of ways to do this.
 
 But wait, you don't need a unique hostname for a tag uri, a unique uri
 (hostname+path) will do? purl.org will only give me the latter, not the
 former, right?

Tag URIs require that the authorizing agency own the domain name and they
cannot specify a date that is before their domain registration or in the
future.  So nobody could mint Tag URIs with purl.org as the domain name.

PURLs might be an interesting solution for you if GAO has a system where
you can resolve SUDOC identifiers.  Then you could create a PURL and point
it to their system.  Now you get to use your PURL for your project and as
a side benefit get lookup capabilities from GAO!  Otherwise you could just
send them to a relevant page on GAO site.


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Ray Denenberg, Library of Congress
Correct me if I'm wrong but isn't the point of all this to be able to put 
the URI in an OpenURL?   And info was invented (in part) to avoid putting 
http URIs in OpenURLs  (because they are complicated enough already, why 
clutter them further).  So I don't see that pursuing an http solution to 
this is very useful.   --Ray



- Original Message - 
From: Houghton,Andrew hough...@oclc.org

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, March 27, 2009 5:24 PM
Subject: Re: [CODE4LIB] registering info: uris?



From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Jonathan Rochkind
Sent: Friday, March 27, 2009 5:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] registering info: uris?

I am not interested in maintaining a sudoc.info registration, and
neither is my institution, who I wouldn't trust to maintain it (even to
the extent of not letting the DNS registration expire) after I left.


BTW, you could always use http://purl.org/ and later if you wanted
to have it resolve to something just change the PURL. 


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ray Denenberg, Library of Congress
 Sent: Friday, March 27, 2009 5:38 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] registering info: uris?
 
 Correct me if I'm wrong but isn't the point of all this to be able to
 put
 the URI in an OpenURL?   And info was invented (in part) to avoid
 putting
 http URIs in OpenURLs  (because they are complicated enough already,
 why
 clutter them further).  So I don't see that pursuing an http solution
 to
 this is very useful.   --Ray

Ray, I don't quite understand the to avoid putting http URIs in
OpenURLs part.  An info URI as well as an HTTP URI use the same 
encoding rules from RFC 3986, URI Generic Syntax.  So neither
has an advantage over the other.  If you have a %80%CC in your
info URI or HTTP URI then sticking it in an OpenURL will 
require it to become %2580%25CC.  So what am I missing about
your statement?


Andy.


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Erik Hetzner
At Fri, 27 Mar 2009 17:18:24 -0400,
Jonathan Rochkind wrote:
 
 I am not interested in maintaining a sudoc.info registration, and 
 neither is my institution, who I wouldn't trust to maintain it (even to 
 the extent of not letting the DNS registration expire) after I left.  I 
 think even something as simple as this really needs to be committed to 
 by an organization.  So yeah, even willing to take on the 
 responsibility of owning that domain until such time as something useful 
 can be done with it, I do not have, and to me that seems like a 
 requirement, not just a nice to have.

I see your point. I believe that registering a domain would be less
work than going through an info URI registration process, but I don’t
know how difficult the info URI registration process would be (thus
bringing the conversation full circle). [1]
 
 But it certainly is another option. I feel like most people have the
 _expectation_ of http resolvability for http URIs though, even
 though it isn't actually required. If you want there to be an actual
 http server there at ALL, even one that just responds to all
 requests with a link to the SuDoc documentation, that's another
 thing you need.

I think there is a strong expectation that if I resolve a URI, I do
not end up with a domain squatter. Otherwise I am not so sure what is
expected when using an HTTP URI whose primary purpose is
identification, not dereferencing. Personally I would be happy to get
either a page telling me to check back later [2], or nothing at all.

best,
Erik Hetzner

1. My last word on this. Because I am already beating a dead horse, I
have put it in a footnote. For $100 and basically no time at all you
can have 10 years of sudoc.info. If it takes an organization more than
2 or 3 hours of work to register an info: URI, then domain
registration is a better deal, as I see it.

2. http://lccn.info/2002022641
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpLGEdroPmog.pgp
Description: Digital Signature


  1   2   >