Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughtsplease

Mark Diggory Fri, 25 May 2007 20:11:58 -0700

On May 25, 2007, at 6:35 PM, Graham Triggs wrote:

> Hi,
>
>> 1) Why would an institution use more than one PI
>> system?  How do you determine which PI system generates a PId  
>> (base it
>> on collection, community)?
>
> There are a lot of theoretical reasons why multiple PI schemes may  
> be in
> use. Even if you have the simple case of an institute / repository  
> defining
> a single PI scheme that it always uses for the contents of the  
> repository,
> depening on what content is being added, there may already be other  
> PIs
> associated with an item that is being deposited (for example, a  
> published
> article may have a DOI).
>
> Beyond that, you may have repositories that have mandated different PI
> schemes being merged, and therefore all those existing PIs need to be
> supported, as well as new ones for the final repository possibly  
> having to
> be assigned.
>
> And with all the issues surrounding 'ownership' and encouraging the  
> use of
> the repository, it may well prove necessary to support (and mandate)
> different PI schemes on a community or collection level.
>
>> 2)  It is mentioned that HTTP isn't "persistent":  Could someone  
>> explain
>> why HTTP isn't as persistent as any other protocol?
>
> Forget to pay your domain registration fee on time and see how  
> persistent it
> is ;-)
>
> Potentially more problematic, what happens when part (or all) of a
> repository is migrated into another? Can the domain be transferred  
> to the
> 'new' location? If not, can URL forwarding be set up on the old URLs?
>
> HTTP can provide a unique identifier for an object at a given point  
> in time,
> but it isn't necessarily going to be possible to rely on it always  
> resolving
> to the same object over it's entire lifetime.


But thats like comparing apples to "apple pickers". Forget  
"resolution", an HTTP url is just as much a URI as a Handle or DOI  
is.  If CNRI's global registration and resolving proxy service  
disappears. What becomes of all the existing handles? Yes, they could  
possibly be considered persistent, but worth little more than  
unresolvable strings until a comparable resolution system is  
reestablished.

>
>> 3) Including special characters in the URL string doesn't seem like a
>> good idea.  While they are valid characters, it does take extra
>> processing to encode/decode them from layer to layer.
>
> Totally agreed - having colons, etc. in the url is going to lead to  
> problems
> in some circumstances.

Agreed, for DSpace identifiers, keep them simple for maximal  
portability into other naming systems.

>
>> 4) Assigning bitstreams persistent identifiers seems dangerous.   
>> At the
>> very least, version control and a history function are required by  
>> the
>> application and PI system to determine if the PId is actually  
>> pointing
>> to what was requested.  Also, how are multiple bitstreams handled  
>> when
>> assigned to an item?  Does each bitstream get a PId?  How does a user
>> look at all bitstreams associated together by the item when the PId
>> references only a single bitstream?
>
> We had a fair amount of discussion about these issues during the
> architectural review last year - which were largely centered around
> extensions to the existing mechanism in order to reference specific  
> (or
> simply the latest) version of a bitstream as relative to the item.
>
> Whether there is a need to assign an 'actual' PI to individual  
> bitstreams or
> not is very much a policy decision of the repository. Assigning a  
> PI to an
> individual bitstream does not mean that it happens in lieu of  
> assigning one
> to the item itself - so if you want to look at other bitstreams  
> associated
> to the same item, you should use the item PI (and if a user has  
> only been
> given a PI for a specific bitstream, then they could potentially  
> search for
> the item that refers to the bitstream identified by that PI).
>
> As for versioning, again it's a bit of a policy decision, but a PI  
> could be
> assigned to a specific revision (and therefore a new revision would  
> get a
> new PI). You could also have a 'special' PI that would always refer  
> to the
> latest revision.

As long as the any PI or "Bitstream" part of an Item PI is  
controllable and reassignable. For an instance of what not to do, do  
not take the current sequence id and tack it onto the Item id such  
that the replacement of a bitstream (because of ingest error or other  
policy) cannot have the appropriate identifier remapped to it. In  
DSpace sequence ids can only be assigned to one bitstream, removing  
that bitstream and adding another results in a new sequence ID. (But  
actually, this is mostly moot once versioning of Items is introduced).

>> As far as having a default PI system out of the box for Dspace, I  
>> would
>> recommend using a local identifier schema which used the existing  
>> URLs.
>> Include the Handle PI system in the release as a configurable option,
>> but not turned on by default.  This would remove the fake handle  
>> being
>> assigned to all objects and clean up the default URLs out of the box.
>
> Well, now to be controversial. IMHO, too much importance is being  
> focused on
> PIs. Yes, PIs are important for preservation, but that doesn't mean  
> that
> they have to be treated as something specific and central to DSpace.
>
> PIs are 'just' metadata. and supporting multiple ways to resolve a  
> piece (or
> a combination of pieces) of metadata to an asset - or simplying  
> presenting
> them in display - isn't really that hard.
>
> Now there are special concerns about the handling - ensuring it's  
> presence,
> automatic generation/assignment, ensuring uniqueness (probably) -  
> but that's
> all just a question of providing better workflows and metadata  
> handling. In
> other words, any concerns that we have about how we handle persitent
> identifiers could be applicable to any piece (or combination) of  
> metadata -
> and by that token, solving those issues for all metadata would  
> resolve the
> issues for PIs, just be treating them as 'only' metadata.
>
> This would mean that the only id we need to centrally worry about  
> assigning
> to an asset is a unique id to be resolvable within the repository -  
> ie. a
> UUID, which would likely be unique across all DSpace instances, and  
> as such
> could be maintained across migrating from one repository to  
> another. And in
> the very unlikely event that (during migration) you have to have  
> two assets
> with the same UUID in the same repository, then you could replace  
> those
> UUIDs with unique values (within that repository), and maintain the  
> original
> UUID as secondary metadata on both assets which can then be  
> resolved through
> a disambiguation page if somebody ever really did need to refer to  
> something
> via the UUID (although in most cases it would be encouraged that  
> those UUIDs
> are never publicised).

I very much agree with this line of thought. generating UUID's for  
DSpace Objects that are unique across DSpace instances would be ideal  
and solve many internal history and referencing issues, I also think  
it would open the door for certain replication scenarios. Finally,  
keeping them VERY  separate from PI's would stop folks from  
conflating what they are, for instance:  http://dspace.myu.edu/handle/ 
123456789.0/1 is just loaded with unintended meaning, its neither a  
handle nor persistent.

-Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Office: E25-131
Phone: (617) 253-1096



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Re: [Dspace-tech] Persistent identifiers in DSpace -- thoughtsplease

Reply via email to