Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Jonathan Rochkind Fri, 08 May 2009 07:33:03 -0700

I don't understand from your description how Topic Maps solve the"identifying multiple versions of a standard" problem. Which was theoriginal question, right? Or have I gotten confused? I didn't think theoriginal question was even about topic vocabularies, but about how tobest provide an identifier for (eg) Marc 2.1 and another for Marc 2.2,while still allowing machines to ignore versions if they like and justrequest and/or identify generic "marc". And you said that Topic Mapshad a solution to this?

I am genuinely curious -- not neccesarily because I'm ever going to useTopic Maps (sorry!), but because if they have a well thought out testedsolution to this, it could serve as a model in other contexts.


Jonathan

Alexander Johannesen wrote:

On Wed, May 6, 2009 at 18:44, Mike Taylor <[email protected]> wrote:

Can't you just tell us?


Sorry, but surely you must be tired of me banging on this gong by now?
It's not that I don't want to seem helpful, but I've been writing a
bit on this here already and don't want to be marked as spam for Topic
Maps.

In the Topic Maps world our global identificators are called PSI, for
Published Subject Indicators. There's a few subtleties within this,
but they are not so different from any other identificator you'll find
elsewhere (RDF, library world, etc.) except of course they are
*always* URIs. Now, the thing here is that they should *always* be
published somewhere, whether as a part of a list or somewhere. The
next thing is that they always should resolve to something (although
the standard don't require this, however I'd say you're doing it wrong
if you couldn't do this, even if it sometimes is an evil necessity).

This last part is really the important bit, where any PSI will act as
1) a global identificator, and 2) resolve to a human text explaining
what it represents. Systems can "just use it" while at the same time
people can choose the right ones for their uses.

And, yes, the identificators can be done any way you slice them. Some
might think that ie. a PSI set for all dates is crazy as you need to
produce identificators for all dates (or times), and that would be
just way too much to deal with, but again, that's not an identifcation
problem, that's a resolver problem. If I can browse to a PSI and get
the text that "this is 3rd of June, 19971, using the whatsnot calendar
style", then that's safe for me to use for my birthday. Let's pretend
the PSI is http://iso.org/datetime/03061971. By releasing an URI
template computers can work with this automatically, no frills.

Now a bit more technical; any topic (which is a Topic Map
representation of any subject, where "subject" is defined as "anything
you can ever hope to think of") can have more than one PSI, because I
might use the PSI http://someother.org/time/date/3/6/1971 for my date.
If my application only understand this former set of PSIs, I can't
merge and find similar cross-semantics (which really is the core of
the problem this thread has been talking about). But simply attach the
second PSI to the same Topic, and you do. In fact, both parties will
understand perfectly what you're talking about.

More complex is that the definitions of PSI sets doesn't have to
happen on the subject level, ie. the Topic called "Alex" to which I
tried to attach my birthday. It can be moved to a meta model level,
where you say the Topic for "Time and dates" have the PSI for both
organsiations, and all Topics just use one or the other; we're
shifting the explicity of identification up a notch.

Having multiple PSIs might seem a bit unordered, but it's based on the
notion of organic growth, just like the web. People will gravitate
towards using PSIs from the most trusted sources (or most accurate or
most whatever), shifting identification schemes around. This is a good
thing (organic growth) at the price of multiple identifiers, but if
the library world started creating PSIs, I betcha humanity and the
library world both could be saved in one fell swoop! (That's another
gong I like to bang)

I'm kinda anticipating Jonathan saying this is all so complex now. :)
But it's not really; your application only has to have complexity in
the small meta model you set up, *not* for every single Topic you've
got in your map. And they're mergable and shareable, and as such can
be merged and "fixed" (or cleaned or sobered or made less complex) for
all your various needs also.

Anyway, that's the basics. Let me know if you want me to bang on. :)
For me, the problem the library face isn't really the mechanisms of
this (because this is solvable, and I guess you just have to trust
that the Topic Maps community have been doing this for the last 10
years or so already :), however, but how you're going to fit existing
resources into FRBR and RDA, but that's a separate discussion.


Regards,

Alex

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to