Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Jonathan Rochkind Thu, 30 Apr 2009 18:34:46 -0700

Crosswalk is exactly the wrong answer for this. Two very small overlapping 
communities of most library developers can surely agree on using the same 
identifiers, and then we make things easier for US.  We don't need to solve the 
entire universe of problems. Solve the simple problem in front of you in the 
simplest way that could possibly work and still leave room for future expansion 
and improvement. From that, we learn how to solve the big problems, when we're 
ready. Overreach and try to solve the huge problem including every possible use 
case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with 
the kind of over-abstracted over-engineered 
too-complicated-to-actually-catch-on solutions that... we in the library 
community normally end up with. 
________________________________________
From: Code for Libraries [[email protected]] On Behalf Of Peter Noerr 
[[email protected]]
Sent: Thursday, April 30, 2009 6:37 PM
To: [email protected]
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them 
All


Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
"standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88 <sigh> - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are.

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 12:00
> To: [email protected]
> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
> All
>
> Hello everybody.  I apologize for the crossposting, but this is an
> area that could (potentially) affect every one of these groups.  I
> realize that not everybody will be able to respond to all lists,
> but...
>
> First of all, some back story (Code4Lib subscribers can probably skip
> ahead):
>
> Jangle [1] requires URIs to explicitly declare the format of the data
> it is transporting (binary marc, marcxml, vcard, DLF
> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> own URI structure for this (http://jangle.org/vocab/formats#...) but
> this was always been with the intention of moving out of the
> jangle.org into a more "generic" space so it could be used by other
> initiatives.
>
> This same concept came up in UnAPI [2] (I think this thread:
> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> March/thread.html#682
> discusses it a bit - there is a reference there that it maybe had come
> up before) although was rejected ultimately in favor of an (optional)
> approach more in line with how OAI-PMH disambiguates metadata formats.
>  That being said, this page used to try to set sort of convention
> around the UnAPI formats:
> http://unapi.stikipad.com/unapi/show/existing+formats
> But it's now just a squatter page.
>
> Jakob Voss pointed out that SRU has a schema registry and that it
> would make sense to coordinate with this rather than mint new URIs for
> things that have already been defined there:
> http://www.loc.gov/standards/sru/resources/schemas.html
>
> This, of course, made a lot of sense.  It also made me realize that
> OpenURL *also* has a registry of metadata formats:
> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
> refix=oai_dc&set=Core:Metadata+Formats
>
> The problem here is that OpenURL and SRW are using different info URIs
> to describe the same things:
>
> info:srw/schema/1/marcxml-v1.1
>
> info:ofi/fmt:xml:xsd:MARC21
>
> or
>
> info:srw/schema/1/onix-v2.0
>
> info:ofi/fmt:xml:xsd:onix
>
> The latter technically isn't the same thing since the OpenURL one
> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
> email now, eventually SRU would have registered
> info:srw/schema/1/onix-v2.1
>
> There are several other examples, as well (MODS, ISO20775, etc.) and
> it's not a stretch to envision more in the future.
>
> So there are a couple of questions here.
>
> First, and most importantly, how do we reconcile these different
> identifiers for the same thing?  Can we come up with some agreement on
> which ones we should really use?
>
> Secondly, and this gets to the reason why any of this was brought up
> in the first place, how can we coordinate these identifiers more
> effectively and efficiently to reuse among various specs and
> protocols, but not:
> 1) be tied to a particular community
> 2) require some laborious and lengthy submission and review process to
> just say "hey, here's my FOAF available via UnAPI"
> 3) be so lax that it throws all hope of authority out the window
> ?
>
> I would expect the various communities to still maintain their own
> registries of "approved" data formats (well, OpenURL and SRU, anyway
> -- it's not as appropriate to UnAPI or Jangle).
>
> Does something like this interest any of you?  Is there value in such
> an initiative?
>
> Thanks,
> -Ross.

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to