On Wed, Oct 26, 2011 at 02:25:24PM -0500, Tim Donohue wrote:
> Small comment on the 'uniqueness' of 'sequence_id'
> 
> On 10/26/2011 1:47 PM, Mark Diggory wrote:
> >> My answer would be that bitstream 'sequence_id' should be unique
> >> *per-Item*. Admittedly though, this doesn't seem to be documented anywhere.
> >
> > It has absolutely no uniqueness constraint on it in the database.
> >
> >     sequence_id             INTEGER
> 
> To be honest, the lack of uniqueness constraint in the DB doesn't mean 
> it still shouldn't be unique per-Item.

It may mean that there *should* be a uniqueness constraint in the
database.  But we would also need to catch the resulting errors and
interpret them where possible.

> If it did have a uniqueness constraint, that would force sequence_id to 
> be unique across *all* bitstreams in the system.
> 
> sequence_id   INTEGER UNIQUE

If you do it that way.

CREATE VIEW BitstreamSequences AS
   SELECT item_id, bundle_id sequence_id
    FROM Item
     JOIN Item2Bundle USING(item_id)
     JOIN Bundle2Bitstream USING(bundle_id)
     JOIN Bitstream USING(bitstream_id);

CREATE CONSTRAINT ic_bitstream_sequences CHECK (
 EXISTS (SELECT * from BitstreamSequence AS bs1 WHERE
 NOT EXISTS (SELECT select * FROM BitstreamSequences AS bs2
  WHERE bs1.item_id = bs2.item_id AND bs1.bundle_id != bs2.bundle_id
  AND bs1.sequence_id = bs2.sequence_id)
 )
);

(I think -- not tested, and my SQL-fu is perhaps not yet that strong.)

> All that being said, obviously Mark Wood found out that we really aren't 
> properly checking that a sequence_id is unique per Item.  All we are 
> doing is attempting to ensure that, when you add bitstreams through the 
> DSpace Java API (either via UI or CLI), each of those bitstreams will be 
> assigned a sequence_id which is unique within its associated Item.
> 
> But, if "bad data" gets in your DB somehow (by bypassing our DSpace Java 
> API), we aren't warning the user or telling the user that his/her data 
> is "invalid" (which is what Mark Wood discovered the hard way).

The question, then, is whether to use the DBMS to check this stuff or
build something custom.  I think that integrity checks need to be as
close as possible to the data, which suggests the DBMS.  Looking at
the monstrosity I wrote above, I'm not so sure. :-/

-- 
Mark H. Wood, Lead System Programmer   [email protected]
Asking whether markets are efficient is like asking whether people are smart.

Attachment: pgpoA6y20AF02.pgp
Description: PGP signature

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to