On Wed, Oct 26, 2011 at 10:28:00PM -0700, Mark Diggory wrote:
> I'd say
> 1.) We should consider the MIT proposal to do away with Bundles, its
> critical that we reduce the complexity in this area of DSpace an align
> it with the Fedora Object Model.  Bundles are poorly designed at the
> moment and are inadequate for capturing the internal structure of the
> content as we see in other approaches like Fedora.  The Hydra Common
> Module suggests this detail should be captured in a METS structmap
> bitstream, such that it can contain a richer hierarchy than a fixed
> layer of bundle structures can represent.

I'm not so sure about taking nice processed fields and storing them
away in metadata streams that have to be parsed all the time, but
that's another argument and at least partly a matter of taste.  But I
agree that Bundle doesn't really contain anything in a strong sense;
we use it more like an attribute.

We do need to move toward a more expressive model, and pay attention
to what is really attributed:  a concrete object or a relationship.
If we want to fix things in the meantime, however, we can probably get
design, consensus, and implementation a lot quicker for a DB
constraint and still improve the model at whatever pace that needs to go.

> 2.) The uniqueness constraint and the sequence_id should be expressed
> in the database, such constraints shouldn't be bound in application
> code, this is an example of poor design choices on where to persist
> this detail, it should not have been stored on the bitstream table, it
> should have been in the bundle2bitstream table, or if you want it
> unique across all bitstreams in an item, there should have been a
> item2bitstream table that stored the sequence id, then the uniqueness
> constraint could have been enforced properly.
> Why, because sequence id is irrelevant to the Bitstream itself, it is
> an attribute of the container/collection object that is aggregating
> the bitstreams. A sequence id is meaningless in relation to the
> Bitstreams that are used in Community and Collection logos, etc. It is
> only relevant to the Item/Bundle container holding the Bitstream.

That's what I thought until I read Richard Rodgers' note on the
design.  If sequence_id *were* used for sequencing things, then it
would properly be an attribute of the relationship between the
bitstream and the thing within which it is sequenced, not of either
object.

But that's not what it's supposed to be for; it's a stable identifier
for the bitstream, unique within item but otherwise meaningless.  It
should've been called something like name or serial_no to indicate
that it is uninterpretable.

If bitstreams need to be mappable to multiple containers then the
sequence_id now needs to be globally unique and we should reconsider
something like hashing a timestamp.

In any case, unless someone can show significant performance problems
with the somewhat complicated SQL required, I think the DBMS is the
best tool to check for uniqueness.  We only pay the cost when a
bitstream is being accepted, which should be rare.  That suggests that
here "significant" means "submission takes noticeably longer".

BTW there's another bit of strangeness:  the field is stored in
Bitstream but the behavior all seems to be in Item.

-- 
Mark H. Wood, Lead System Programmer   [email protected]
Asking whether markets are efficient is like asking whether people are smart.

Attachment: pgp3oRM2Y4Q2j.pgp
Description: PGP signature

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to