On 11 November 2013 15:26, Tim Donohue <tdono...@duraspace.org> wrote:

> Whatever we use, I think the IDs surfaced by the REST API do need to be
> persistent. They need not necessarily be globally unique/persistent
> (like a handle or DOI), but they should be persistent within DSpace,
>
> Database IDs are definitely *not* persistent. They are just an
> incrementing ID, and that ID changes if you move an object from one
> DSpace to another.


But if we are saying that it does not need to be globally unique or
globally persistent, then the ID changing when you move an object to
another DSpace instance is not really an issue.


> It's worth also pointing out that the AIP system does
> NOT persist database IDs (and says that explicitly). So, even restoring
> content from AIPs will often not provide the same Database ID. It'd be
> unfortunate if upon restoring content to your DSpace (from an AIP), it's
> no longer available at the same location from the REST API.
>

Unfortunate, but I don't see it as a deal breaker. The primary disaster
recovery strategy would be to backup and restore the database (and file) -
which would retain the same ID.

We can't guarantee that database IDs will persist across DSpace versions
(i.e. during an upgrade) - and we should explicitly state that.

External systems integrating via REST need to be able to cope with content
being removed, and added. As long as they do this, then they should cope
with content being selectively deleted and subsequently re-imported via an
AIP.


> Currently our only DSpace persistent IDs to choose from are Handles and
> now DOIs (in 4.0). I see no reason why we shouldn't use either (or both)
> of these in the REST API, especially since we have no other DSpace PID
> or UUID to work from (at least not yet).
>

Bear in mind, that they aren't necessarily persistent either - we ship out
of the box with a fake handle prefix, and tools to be able to reset the
prefix - i.e. in the case that a prefix gets registered later.

Therefore, a persistent ID is only as persistent as the implementing
institution chooses for it to be!

So, for the purposes of a REST API, I believe that the database ID is
"persistent enough". External systems have to cope with variation over
time, and can always be flushed during a major upgrade of DSpace, if
necessary.

However...

We don't necessarily need to identify what the persistent ID "means" via
> REST API. But, it should be persistent.
>
> Therefore an identifier:
>
> http://localhost:8080/webapi/content/collection/12.34/56
>
> could be either a Handle (hdl:12.34/56) or a DOI (doi:12.34/56) or some
> other sort of PID.


What's more important here is that the REST API uses a *public* identifier,
which is consistent with the UI.

The primary instance URL of an item is based on it's handle, rightly or
wrongly (mostly wrongly!). You should be able to take something that is
used to identify an object in DSpace, and use it in the REST API.

You know the "handle" (fake or real) for an item. You don't necessarily
know it's database ID. And that's what makes using database ID in REST a
problem.

Ideally, I would like to see a primary object URL using an
instance-persistent public ID - which is not the handle or DOI - and that
would be the primary ID that you use in the REST API. But, on top of that,
if you configure a real persistent identifier, you could get to the object
- both in the UI and the API, using that identifier.


> We don't necessarily need to specify which it is (at
> least not on the URI). But we do need to ensure it is unique and
> persistent within DSpace. That way if someone requests 12.34/56 today,
> and also requests it a year from now, they should be getting the same
> object from that particular DSpace.   Unfortunately, that's not a
> guarantee we can make with Database IDs, as nothing in DSpace ensures
> their persistence after upgrades or data migrations or restorations from
> backup.
>

Which brings us to the question of what representation you get back. Is it
necessarily the same as the representation you got a year ago? What about
if you kept the object / identifier and replaced all of the metadata and
the files? I realise it's a rather bizarre example, that isn't likely to
occur in real life, but in general, a little too much stock is invested in
persistent identifiers. They are important in the overall scheme of things,
but when it comes to the API, and the situations it is meant to support,
being consistent is far more important that being persistent.

In reality, we are rather limited in what we call public identifiers in
DSpace, but we're possibly causing problems by having a formal contract of
persistence on REST API identifiers.

G
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to