Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Peter Noerr Thu, 30 Apr 2009 19:22:34 -0700

I just wanted to be sure that the larger extent of this problem was raised. Two 
(or 4) groups solving the issue is a great start.


However what you learn here may not be applicable in the large. And some of us 
do have this large problem today. So we work through it in small steps in an 
extensible fashion - which for me is not attempting to create the overall grand 
unified set of everything.

Peter

> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 18:53
> To: [email protected]
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Technically it's 4 communities, but, yes, only two currently have
> "credible" registries in place.
> 
> -Ross.
> 
> On Thu, Apr 30, 2009 at 9:28 PM, Jonathan Rochkind <[email protected]>
> wrote:
> > Crosswalk is exactly the wrong answer for this. Two very small
> overlapping communities of most library developers can surely agree on
> using the same identifiers, and then we make things easier for US.  We
> don't need to solve the entire universe of problems. Solve the simple
> problem in front of you in the simplest way that could possibly work and
> still leave room for future expansion and improvement. From that, we learn
> how to solve the big problems, when we're ready. Overreach and try to solve
> the huge problem including every possible use case, many of which don't
> apply to you but SOMEDAY MIGHT... and you end up with the kind of over-
> abstracted over-engineered too-complicated-to-actually-catch-on solutions
> that... we in the library community normally end up with.
> > ________________________________________
> > From: Code for Libraries [[email protected]] On Behalf Of Peter
> Noerr [[email protected]]
> > Sent: Thursday, April 30, 2009 6:37 PM
> > To: [email protected]
> > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> >
> > Some further observations. So far this threadling has mentioned only
> trying to unify two different sets of identifiers. However there are a much
> larger number of them out there (and even larger numbers of schemas and
> other "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-
> talking-about") and the problem exists for any of these things
> (identifiers, etc.) where there are more than one of them. So really
> unifying two sets of identifiers, while very useful, is not actually going
> to solve much.
> >
> > Is there any broader methodology we could approach which potentially
> allows multiple unifications or (my favourite) cross-walks. (Complete
> unification requires everybody agrees and sticks to it, and human history
> is sort of not on that track...) And who (people and organizations) would
> undertake this?
> >
> > Ross' point about a lightweight approach is necessary for any sort of
> adoption, but this is a problem (which plagues all we do in federated
> search) which cannot just be solved by another registry.
> Somebody/organisation has to look at the identifiers or whatever and decide
> that two of them are identical or, worse, only partially overlap and hence
> scope has to be defined. In a syntax that all understand of course. Already
> in this thread we have the sub/super case question from Karen (in a post on
> the openurl (or Z39.88 <sigh> - identifiers!) listserv). And the various
> identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709,
> MARCUK-ISO2709. Now explain in words of one (computer understandable)
> syllable what the differences are.
> >
> > I'm not trying to make problems. There are problems and this is only a
> small subset of them, and they confound us every day. I would love to adopt
> standard definitions for these things, but which Standard? Because anyone
> can produce any identifier they like, we have decided that the unification
> of them has to be kept internal where we at least have control of the
> unifications, even if they change pretty frequently.
> >
> > Peter
> >
> >
> > Dr Peter Noerr
> > CTO, MuseGlobal, Inc.
> >
> > +1 415 896 6873 (office)
> > +1 415 793 6547 (mobile)
> > www.museglobal.com
> >
> >
> >> -----Original Message-----
> >> From: Code for Libraries [mailto:[email protected]] On Behalf Of
> >> Ross Singer
> >> Sent: Thursday, April 30, 2009 12:00
> >> To: [email protected]
> >> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them
> >> All
> >>
> >> Hello everybody.  I apologize for the crossposting, but this is an
> >> area that could (potentially) affect every one of these groups.  I
> >> realize that not everybody will be able to respond to all lists,
> >> but...
> >>
> >> First of all, some back story (Code4Lib subscribers can probably skip
> >> ahead):
> >>
> >> Jangle [1] requires URIs to explicitly declare the format of the data
> >> it is transporting (binary marc, marcxml, vcard, DLF
> >> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> >> own URI structure for this (http://jangle.org/vocab/formats#...) but
> >> this was always been with the intention of moving out of the
> >> jangle.org into a more "generic" space so it could be used by other
> >> initiatives.
> >>
> >> This same concept came up in UnAPI [2] (I think this thread:
> >> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> >> March/thread.html#682
> >> discusses it a bit - there is a reference there that it maybe had come
> >> up before) although was rejected ultimately in favor of an (optional)
> >> approach more in line with how OAI-PMH disambiguates metadata formats.
> >>  That being said, this page used to try to set sort of convention
> >> around the UnAPI formats:
> >> http://unapi.stikipad.com/unapi/show/existing+formats
> >> But it's now just a squatter page.
> >>
> >> Jakob Voss pointed out that SRU has a schema registry and that it
> >> would make sense to coordinate with this rather than mint new URIs for
> >> things that have already been defined there:
> >> http://www.loc.gov/standards/sru/resources/schemas.html
> >>
> >> This, of course, made a lot of sense.  It also made me realize that
> >> OpenURL *also* has a registry of metadata formats:
> >>
> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
> >> refix=oai_dc&set=Core:Metadata+Formats
> >>
> >> The problem here is that OpenURL and SRW are using different info URIs
> >> to describe the same things:
> >>
> >> info:srw/schema/1/marcxml-v1.1
> >>
> >> info:ofi/fmt:xml:xsd:MARC21
> >>
> >> or
> >>
> >> info:srw/schema/1/onix-v2.0
> >>
> >> info:ofi/fmt:xml:xsd:onix
> >>
> >> The latter technically isn't the same thing since the OpenURL one
> >> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
> >> email now, eventually SRU would have registered
> >> info:srw/schema/1/onix-v2.1
> >>
> >> There are several other examples, as well (MODS, ISO20775, etc.) and
> >> it's not a stretch to envision more in the future.
> >>
> >> So there are a couple of questions here.
> >>
> >> First, and most importantly, how do we reconcile these different
> >> identifiers for the same thing?  Can we come up with some agreement on
> >> which ones we should really use?
> >>
> >> Secondly, and this gets to the reason why any of this was brought up
> >> in the first place, how can we coordinate these identifiers more
> >> effectively and efficiently to reuse among various specs and
> >> protocols, but not:
> >> 1) be tied to a particular community
> >> 2) require some laborious and lengthy submission and review process to
> >> just say "hey, here's my FOAF available via UnAPI"
> >> 3) be so lax that it throws all hope of authority out the window
> >> ?
> >>
> >> I would expect the various communities to still maintain their own
> >> registries of "approved" data formats (well, OpenURL and SRU, anyway
> >> -- it's not as appropriate to UnAPI or Jangle).
> >>
> >> Does something like this interest any of you?  Is there value in such
> >> an initiative?
> >>
> >> Thanks,
> >> -Ross.
> >

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to