I summarized my thoughts about identifiers for data formats in a blog posting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/

In short it’s not a technology issue but a commitment issue and the problem of identifying the right identifiers for data formats can be reduced to two fundamental rules of thumb:

1. reuse: don’t create new identifiers for things that already have one.

2. document: if you have to create an identifier describe its referent as open, clear, and detailled as possible to make it reusable.

A format should be described with a schema (XML Schema, OWL etc.) or at least a standard. Mostly this schema already has a namespace or similar identifier that can be used for the whole format.

For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML Namespace http://www.loc.gov/mods/v3 so this is the best identifier to identify MODS. If you need to identify a specific version then you should *first* look if such identifiers already exist, *second* push the publisher (LOC) to assign official URIs for MODS versions, if this do not already exist, or *third* create and document specific URIs and make that everyone knows about this identifiers. At the moment there are:

MODS Version 3     http://www.loc.gov/mods/v3
MODS Version 3.0   info:srw/schema/1/mods-v3.0
MODS Version 3.1   info:srw/schema/1/mods-v3.1
MODS Version 3.2   info:srw/schema/1/mods-v3.2
MODS Version 3.3   info:srw/schema/1/mods-v3.3

The SRU Schemas registry links the "info:srw/schema/1/mods-v3*" identifiers to its XML Schemas which is very little documentation but it links to http://www.loc.gov/mods/v3 at least in some way.

Ross wrote:

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Use the one that is documented best.

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"

The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about identifiers that are not URIs. OAI-PMH at least includes a mechanism to map metadataPrefixes to official URIs but this mechanism is not always used. If unAPI lacks a way to map a local name to a global URI, we should better fix unAPI to tell us:

<?xml version="1.0" encoding="UTF-8"?>
<formats xmlns="http://unapi.info/";>
  <format name="foaf" uri="http://xmlns.com/foaf/0.1/"/>

unAPI should be revised and specified bore strictly to become an RFC anyway. Yes, this requires a laborious and lengthy submission and review process but there is no such thing as a free lunch.

3) be so lax that it throws all hope of authority out the window

Reuse existing authorities and document better to create authority.

I would expect the various communities to still maintain their own
registries of "approved" data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).

There should be a distinction between descriptive registries that only list identifiers and formats that are defined elsewhere and authoritative registries that define new identifiers and formats. The number of authoritatively defined identifiers should be small for a given API because the identifier should better be defined by the creator of the format instead by a user of the format. If the creator does not support usable identifiers then better talk to him instead of creating something in parallel.


Jakob Voß <jakob.v...@gbv.de>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

Reply via email to