<this somehow bounced from the tech list yesterday> I throw up a flag of caution. I think there are concerns with unpredictable behavior of authorization/policies when such a behavior is allowed (many to many) and we see that in the the database schema where we have added a column to "Item" pertaining to its "owning collection". We should be careful that the issues that arise around policy enforcement are not eroded by exposing the many to many relationships found in the model/database to the business logic layer. There is a bit of duct tape over this model that keeps things from getting out of control which we need to be aware of and I'm not sure if it works as well for communities as it does for collections.
As well, we could really use someone looking at database "normalization" in areas like this.. I.E. should the Item table contain any reference to its owning collection, or should that be in another table? It happens again in Bundle with "primary bitstream id". Here the Bundle table references a bitstream outside of the "Bundle2Bitstream" table. I think its great to talk about the model by looking that the database schema itself. So I've inlined some of the important structures below It may be helpful to our 2.0 efforts to consider an above "normalization" effort across the whole database to assure that we are doing things in the best possible approach (this isn't science, this is just good database design). For instance, one might consider that we need to normalize much of the "workflow" expressed in the Collection table by separating it out such that its in another table referring to the Collection table. In fact, we can apply this logic all over the database, After which we will begin to see that Community, Collection, Item, Bundle and Bitstream tables are really converging on the same schema structure (they should be, they are all DSOs) and that the structure would really only end up holding those attributes that are in common across those objects. Now, I'm not saying its ideal to put them all into the same "table", separate tables are an optimization of container vs contained. Many containers have very fewer members which are accessed at a higher frequency. What I am saying is that there is a "template" for DSOs that we might work to make more formal for 2.0. And that we will be expressing things like the "structure" of the relationships between DSO's separately from the attachment of "properties" or "metadata" and that the decisions about this "structure" should be part of the "core" dspace teams mandate. Daniele, thanks for opening up this "can of worms" this is really what I hoped your efforts to apply defacto standard tools like Hibernate and Spring would begin to expose and get us talking about. Welcome on board! p.s. this discussion is moving to be very developer oriented, but I don't want to derail it by moving it to the dev list. I think we might get some interesting non-developer input if we leave it here as well. > ------------------------------------------------------- > -- Community table > ------------------------------------------------------- > CREATE TABLE Community > ( > community_id INTEGER PRIMARY KEY, > name VARCHAR(128), > short_description VARCHAR(512), > introductory_text TEXT, > logo_bitstream_id INTEGER REFERENCES Bitstream(bitstream_id), > copyright_text TEXT, > side_bar_text TEXT > ); > ------------------------------------------------------- > -- Collection table > ------------------------------------------------------- > CREATE TABLE Collection > ( > collection_id INTEGER PRIMARY KEY, > name VARCHAR(128), > short_description VARCHAR(512), > introductory_text TEXT, > logo_bitstream_id INTEGER REFERENCES Bitstream(bitstream_id), > template_item_id INTEGER REFERENCES Item(item_id), > provenance_description TEXT, > license TEXT, > copyright_text TEXT, > side_bar_text TEXT, > workflow_step_1 INTEGER REFERENCES EPersonGroup > ( eperson_group_id ), > workflow_step_2 INTEGER REFERENCES EPersonGroup > ( eperson_group_id ), > workflow_step_3 INTEGER REFERENCES EPersonGroup > ( eperson_group_id ), > submitter INTEGER REFERENCES EPersonGroup > ( eperson_group_id ), > admin INTEGER REFERENCES EPersonGroup( eperson_group_id) > ); > ------------------------------------------------------- > -- Item table > ------------------------------------------------------- > CREATE TABLE Item > ( > item_id INTEGER PRIMARY KEY, > submitter_id INTEGER REFERENCES EPerson(eperson_id), > in_archive BOOL, > withdrawn BOOL, > last_modified TIMESTAMP WITH TIME ZONE, > owning_collection INTEGER > ); > ------------------------------------------------------- > -- Bundle table > ------------------------------------------------------- > CREATE TABLE Bundle > ( > bundle_id INTEGER PRIMARY KEY, > name VARCHAR(16), -- ORIGINAL | THUMBNAIL | TEXT > primary_bitstream_id INTEGER REFERENCES Bitstream(bitstream_id) > ); > ------------------------------------------------------- > -- Bitstream table > ------------------------------------------------------- > CREATE TABLE Bitstream > ( > bitstream_id INTEGER PRIMARY KEY, > bitstream_format_id INTEGER REFERENCES > BitstreamFormatRegistry(bitstream_format_id), > name VARCHAR(256), > size_bytes BIGINT, > checksum VARCHAR(64), > checksum_algorithm VARCHAR(32), > description TEXT, > user_format_description TEXT, > source VARCHAR(256), > internal_id VARCHAR(256), > deleted BOOL, > store_number INTEGER, > sequence_id INTEGER > ); > ------------------------------------------------------- > -- Community2Community table > ------------------------------------------------------- > CREATE TABLE Community2Community > ( > id INTEGER PRIMARY KEY, > parent_comm_id INTEGER REFERENCES Community(community_id), > child_comm_id INTEGER REFERENCES Community(community_id) > ); > ------------------------------------------------------- > -- Community2Collection table > ------------------------------------------------------- > CREATE TABLE Community2Collection > ( > id INTEGER PRIMARY KEY, > community_id INTEGER REFERENCES Community(community_id), > collection_id INTEGER REFERENCES Collection(collection_id) > ); > ------------------------------------------------------- > -- Collection2Item table > ------------------------------------------------------- > CREATE TABLE Collection2Item > ( > id INTEGER PRIMARY KEY, > collection_id INTEGER REFERENCES Collection(collection_id), > item_id INTEGER REFERENCES Item(item_id) > ); > ------------------------------------------------------- > -- Item2Bundle table > ------------------------------------------------------- > CREATE TABLE Item2Bundle > ( > id INTEGER PRIMARY KEY, > item_id INTEGER REFERENCES Item(item_id), > bundle_id INTEGER REFERENCES Bundle(bundle_id) > ); > ------------------------------------------------------- > -- Bundle2Bitstream table > ------------------------------------------------------- > CREATE TABLE Bundle2Bitstream > ( > id INTEGER PRIMARY KEY, > bundle_id INTEGER REFERENCES Bundle(bundle_id), > bitstream_id INTEGER REFERENCES Bitstream(bitstream_id) > ); On Nov 19, 2007, at 8:34 AM, Andrea Bollini wrote: > The mismatch is very old... > I think that the API is the most right source of info, I assume that > all of us are happy with many-to-many associations for communities > (also > with themselves), collections & items. I'm not sure if we want keep a > many-to-many for item-bundle-bitstream. In my opinion these are only > one-to-many and we should change the API (and db) in this way. I know > that the Jim work about versioning uses the many-to-many > association for > item & bundle to share some of them between different versions, I hope > that we can change this to share only the physical location of the > bitstreams. In this way should be more simple to keep different > versions > that change only metadata of a bitstream. > > Community * - * Community > Community * - * Collection > Collection * - * Item > Item 1 - * Bundle > Bundle 1 - * Bitstrea > > Best, > Andrea > > daniele.ninfo ha scritto: >> Hi all, >> >> I'm writing to underline some differences between the data model >> shown in the dspace site >> ( http://www.dspace.org/index.php? >> option=com_content&task=view&id=149#data_model ) and the actual >> code present in the /trunk ( http://dspace-sandbox.googlecode.com/ >> svn/mirror/dspace/trunk/ ). >> >> There are differences in the associations between objects of the >> model: some one-to-many associations in the model are replaced in >> the code by many-to-many ones. I found 3 examples: >> >> - In the model, a Collection can be owned by only one Community, >> whereas in the code a Collection can be owned by many Community. >> - In the model, a Bundle can be owned by only one Item, and in the >> code it can be owned by more then one Item >> - In the model, a Bitstream can be owned by only one Bundle, and >> in the code by more then one. >> >> In the database schema, all the associations are many-to-many, >> there are no restrictions. >> I also looked at Jim's DAO-prototype ( http://dspace- >> sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he >> reflects the database schema, using only many-to-many associations. >> >> I'm working on a prototype to introduce Hibernate, and i'd base my >> work on Jim's choice, but i'd like to have some opinions about it, >> what should we do? update the data model associations? follow the >> model and update the code in /trunk? what kind of associations >> should be kept and what dropped? >> >> My mentor Andrea will write his opinions replying to this email, >> to let everyone know them. We both are interested in having other >> opinions to choose the best way to go ahead. >> ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

