On Wed, Oct 26, 2011 at 10:28:00PM -0700, Mark Diggory wrote: > I'd say > 1.) We should consider the MIT proposal to do away with Bundles, its > critical that we reduce the complexity in this area of DSpace an align > it with the Fedora Object Model. Bundles are poorly designed at the > moment and are inadequate for capturing the internal structure of the > content as we see in other approaches like Fedora. The Hydra Common > Module suggests this detail should be captured in a METS structmap > bitstream, such that it can contain a richer hierarchy than a fixed > layer of bundle structures can represent.
I'm not so sure about taking nice processed fields and storing them away in metadata streams that have to be parsed all the time, but that's another argument and at least partly a matter of taste. But I agree that Bundle doesn't really contain anything in a strong sense; we use it more like an attribute. We do need to move toward a more expressive model, and pay attention to what is really attributed: a concrete object or a relationship. If we want to fix things in the meantime, however, we can probably get design, consensus, and implementation a lot quicker for a DB constraint and still improve the model at whatever pace that needs to go. > 2.) The uniqueness constraint and the sequence_id should be expressed > in the database, such constraints shouldn't be bound in application > code, this is an example of poor design choices on where to persist > this detail, it should not have been stored on the bitstream table, it > should have been in the bundle2bitstream table, or if you want it > unique across all bitstreams in an item, there should have been a > item2bitstream table that stored the sequence id, then the uniqueness > constraint could have been enforced properly. > Why, because sequence id is irrelevant to the Bitstream itself, it is > an attribute of the container/collection object that is aggregating > the bitstreams. A sequence id is meaningless in relation to the > Bitstreams that are used in Community and Collection logos, etc. It is > only relevant to the Item/Bundle container holding the Bitstream. That's what I thought until I read Richard Rodgers' note on the design. If sequence_id *were* used for sequencing things, then it would properly be an attribute of the relationship between the bitstream and the thing within which it is sequenced, not of either object. But that's not what it's supposed to be for; it's a stable identifier for the bitstream, unique within item but otherwise meaningless. It should've been called something like name or serial_no to indicate that it is uninterpretable. If bitstreams need to be mappable to multiple containers then the sequence_id now needs to be globally unique and we should reconsider something like hashing a timestamp. In any case, unless someone can show significant performance problems with the somewhat complicated SQL required, I think the DBMS is the best tool to check for uniqueness. We only pay the cost when a bitstream is being accepted, which should be rare. That suggests that here "significant" means "submission takes noticeably longer". BTW there's another bit of strangeness: the field is stored in Bitstream but the behavior all seems to be in Item. -- Mark H. Wood, Lead System Programmer [email protected] Asking whether markets are efficient is like asking whether people are smart.
pgp3oRM2Y4Q2j.pgp
Description: PGP signature
------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________ Dspace-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-devel
