Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Ross Singer Thu, 30 Apr 2009 19:21:57 -0700

On Thu, Apr 30, 2009 at 6:37 PM, Peter Noerr <[email protected]> wrote:
> Some further observations. So far this threadling has mentioned only trying 
> to unify two different sets of identifiers. However there are a much larger 
> number of them out there (and even larger numbers of schemas and other 
> "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
>  and the problem exists for any of these things (identifiers, etc.) where 
> there are more than one of them. So really unifying two sets of identifiers, 
> while very useful, is not actually going to solve much.


Well, that wasn't really my intention (although I thought it wouldn't
be a bad start).  What I would really prefer is that we compile these
into a single vocabulary that could be used as a reference point.
>
> Is there any broader methodology we could approach which potentially allows 
> multiple unifications or (my favourite) cross-walks. (Complete unification 
> requires everybody agrees and sticks to it, and human history is sort of not 
> on that track...) And who (people and organizations) would undertake this?

Realistically, we could achieve this via the NSDL MetadataRegistry and SKOS.

We could have something like:
<http://purl.org/DataFormat/marcxml>
  . <skos:prefLabel> "MARC21 XML" .
  . <skos:notation> "info:srw/schema/1/marcxml-v1.1" .
  . <skos:notation> "info:ofi/fmt:xml:xsd:MARC21" .
  . <skos:notation> "http://www.loc.gov/MARC21/slim"; .
  . <skos:broader> http://purl.org/DataFormat/marc .
  . <skos:description> "..." .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's
not really the point.  The point is that all of these various
identifiers would be valid, but we'd have a real way of knowing what
they actually mean.  Maybe this is what you mean by a crosswalk.
>
> Ross' point about a lightweight approach is necessary for any sort of 
> adoption, but this is a problem (which plagues all we do in federated search) 
> which cannot just be solved by another registry. Somebody/organisation has to 
> look at the identifiers or whatever and decide that two of them are identical 
> or, worse, only partially overlap and hence scope has to be defined. In a 
> syntax that all understand of course. Already in this thread we have the 
> sub/super case question from Karen (in a post on the openurl (or Z39.88 
> <sigh> - identifiers!) listserv). And the various identifiers for MARC 
> (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now 
> explain in words of one (computer understandable) syllable what the 
> differences are.

This is indeed a valid point.  However, the two registries that
already exist have this sort of granularity there (hence why they
weren't exactly describing the *same* ONIX version).

I guess I'm not really as worried about this problem because I think
if people actually use it, and the system is flexible and editable the
semantics will be worked out.
>
> I'm not trying to make problems. There are problems and this is only a small 
> subset of them, and they confound us every day. I would love to adopt 
> standard definitions for these things, but which Standard? Because anyone can 
> produce any identifier they like, we have decided that the unification of 
> them has to be kept internal where we at least have control of the 
> unifications, even if they change pretty frequently.

Right, which is why I'm feeling less discriminatory on which one is "right".

-Ross.

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to