Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughtsplease

Graham Triggs Fri, 25 May 2007 15:36:08 -0700

Hi,

> 1) Why would an institution use more than one PI
> system?  How do you determine which PI system generates a PId (base it
> on collection, community)?


There are a lot of theoretical reasons why multiple PI schemes may be in 
use. Even if you have the simple case of an institute / repository defining 
a single PI scheme that it always uses for the contents of the repository, 
depening on what content is being added, there may already be other PIs 
associated with an item that is being deposited (for example, a published 
article may have a DOI).

Beyond that, you may have repositories that have mandated different PI 
schemes being merged, and therefore all those existing PIs need to be 
supported, as well as new ones for the final repository possibly having to 
be assigned.

And with all the issues surrounding 'ownership' and encouraging the use of 
the repository, it may well prove necessary to support (and mandate) 
different PI schemes on a community or collection level.

> 2)  It is mentioned that HTTP isn't "persistent":  Could someone explain
> why HTTP isn't as persistent as any other protocol?

Forget to pay your domain registration fee on time and see how persistent it 
is ;-)

Potentially more problematic, what happens when part (or all) of a 
repository is migrated into another? Can the domain be transferred to the 
'new' location? If not, can URL forwarding be set up on the old URLs?

HTTP can provide a unique identifier for an object at a given point in time, 
but it isn't necessarily going to be possible to rely on it always resolving 
to the same object over it's entire lifetime.

> 3) Including special characters in the URL string doesn't seem like a
> good idea.  While they are valid characters, it does take extra
> processing to encode/decode them from layer to layer.

Totally agreed - having colons, etc. in the url is going to lead to problems 
in some circumstances.

> 4) Assigning bitstreams persistent identifiers seems dangerous.  At the
> very least, version control and a history function are required by the
> application and PI system to determine if the PId is actually pointing
> to what was requested.  Also, how are multiple bitstreams handled when
> assigned to an item?  Does each bitstream get a PId?  How does a user
> look at all bitstreams associated together by the item when the PId
> references only a single bitstream?

We had a fair amount of discussion about these issues during the 
architectural review last year - which were largely centered around 
extensions to the existing mechanism in order to reference specific (or 
simply the latest) version of a bitstream as relative to the item.

Whether there is a need to assign an 'actual' PI to individual bitstreams or 
not is very much a policy decision of the repository. Assigning a PI to an 
individual bitstream does not mean that it happens in lieu of assigning one 
to the item itself - so if you want to look at other bitstreams associated 
to the same item, you should use the item PI (and if a user has only been 
given a PI for a specific bitstream, then they could potentially search for 
the item that refers to the bitstream identified by that PI).

As for versioning, again it's a bit of a policy decision, but a PI could be 
assigned to a specific revision (and therefore a new revision would get a 
new PI). You could also have a 'special' PI that would always refer to the 
latest revision.

> As far as having a default PI system out of the box for Dspace, I would
> recommend using a local identifier schema which used the existing URLs.
> Include the Handle PI system in the release as a configurable option,
> but not turned on by default.  This would remove the fake handle being
> assigned to all objects and clean up the default URLs out of the box.

Well, now to be controversial. IMHO, too much importance is being focused on 
PIs. Yes, PIs are important for preservation, but that doesn't mean that 
they have to be treated as something specific and central to DSpace.

PIs are 'just' metadata. and supporting multiple ways to resolve a piece (or 
a combination of pieces) of metadata to an asset - or simplying presenting 
them in display - isn't really that hard.

Now there are special concerns about the handling - ensuring it's presence, 
automatic generation/assignment, ensuring uniqueness (probably) - but that's 
all just a question of providing better workflows and metadata handling. In 
other words, any concerns that we have about how we handle persitent 
identifiers could be applicable to any piece (or combination) of metadata - 
and by that token, solving those issues for all metadata would resolve the 
issues for PIs, just be treating them as 'only' metadata.

This would mean that the only id we need to centrally worry about assigning 
to an asset is a unique id to be resolvable within the repository - ie. a 
UUID, which would likely be unique across all DSpace instances, and as such 
could be maintained across migrating from one repository to another. And in 
the very unlikely event that (during migration) you have to have two assets 
with the same UUID in the same repository, then you could replace those 
UUIDs with unique values (within that repository), and maintain the original 
UUID as secondary metadata on both assets which can then be resolved through 
a disambiguation page if somebody ever really did need to refer to something 
via the UUID (although in most cases it would be encouraged that those UUIDs 
are never publicised).

G 

This email has been scanned by Postini.
For more information please visit http://www.postini.com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughtsplease

Reply via email to