Re: [Dspace-tech] Differences between the data model and the trunk
On Nov 21, 2007, at 2:44 AM, Larry Stone wrote: This is really getting out of scope for dspace-tech, But its important to discuss direction transparently, so I do not consider it out of scope. but I'd just like to make a plea to look at the data model in the abstract rather than at the implementation level: the way it appears in database tables *doesn't* *matter* at this stage of thinking about it, and I think it muddies the waters even to talk about them. No, this is the dilemma, it really doesn't matter what we talk of conceptually, the only real thing is what is implemented. Everything else is a speculative attempt to describe what really exists as the implementation of the DSpace storage solution. Per, DSpace 2.0, yes we will have an abstract model, but it should be rooted in reality, in what can actually be accomplished with the existing storage solution a.k.a a relational database. There are objects, which perhaps have both attributes and relationships; that's the abstract way to discuss it. It is inconsequential whether attributes are implemented as columns in a table and relationships are an RDF triplestore -- what matters is the abstract model. I would find it overly complex and dismaying if the result of the 2.0 re-architecture could not be expressed in simple relational terms. That said, I notice there is a tendancy to add, or want to add, lots of different kinds of relationships to the data model. For example, an Item has an owning collection (or several) for the purpose of access control, and perhaps a different parent for UI appearance and yet another for navigation. Those could be typed relationships. There was some discussion of this in http://wiki.dspace.org/index.php/BitstreamRelationships too. I'm actually, attempting just to clarify what really exists in the existing implementation of our data model. We currently do have that relationship present in the schema. Thus, either that data model documentation is really out of date, or the column shouldn't exist in the schema implementing that model. Perhaps we could benefit from a very general relationship model that lets the API client create typed relationships between *any* DSOs, but of course it would need to enforce rules as well: - acceptable domain and range of each kind of relationship operator - schema restrictions on relationships (e.g. one-to-many, one-to-one) - access control on the relationships themselves. Sure, and we will ultimately return to how these would be expressed in the schema. And I predict ultimately once proper normalization and exclusion of non-model properties occurred in the database schema, the above would most certainly exist. All DSpace Objects would inherit some common traits, e.g. an identifier unique among all DSOs, and this mechanism that manages relationships between any DSOs. The mechanism implements all the schema restrictions and policies. We already did all that at the Architectural review and the DAO prototype. ...but that's what I mean about keeping the discussion abstract: I'm not going to say if the relationship is really an RDF statement no matter how much it looks like one.. Let's just look at the problem without getting boxed into a particular solution. Well, if you get too abstract, then nothing gets done, as well, developers loose interest and threads die... We have a particular solution right now. My use of RDF or relational terms isn't to box us in, but to begin to draw out what is analogous across these technologies, and thus where there is true abstraction, not abstraction for abstractions sake. Cheers, Mark -- Larry On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote: Larry Stone ha scritto: Collection * - * Item It's worth noting that while an Item may be a member of multiple Collections, it still refers to only one of them as its owner; it is returned by getOwningCollection(). true but IMHO this is not really needed... Well, what I've suggested in my previous email is not wether it is needed or not, but where it should correctly reside in relational terms, owner is a relationship, not an attribute of Item and Collection, thus the more appropriate location would be in the container or relationship tables. I.E. rather than: --- -- Item table --- CREATE TABLE Item ( item_id INTEGER PRIMARY KEY, submitter_idINTEGER REFERENCES EPerson(eperson_id), in_archive BOOL, withdrawn BOOL, last_modified TIMESTAMP WITH TIME ZONE, owning_collection INTEGER ); --- -- Collection2Item table --- CREATE TABLE Collection2Item ( idINTEGER PRIMARY KEY, collection_id INTEGER REFERENCES
Re: [Dspace-tech] Differences between the data model and the trunk
Mark, I have understanding you opinion but I'm not really sure that the owning collection is a relationship in the actual model or that we need a similar relationship in future development. At the moment we need owning collection only for keep simple some presentation (display style, bread crumbs) authorization choice, looking forward I think that all collections where the item is mapped really own the item, the item mapper feature should check not only for ADD permission on the target collection but also for ADMIN permission on the item self. Presentation issue should be resolved in their domain: bread crumbs should show the path used by the user for get the item, display should be based only on the item metadata so. I propose to rename OwningCollection in SubmittedCollection to keep live this info (as for submitter) also after that the inProgressSubmission became an Archived Item. With this approach SubmittedCollection has to be considerate an item attribute and not a relationship... (the SubmittedCollection could be also not an own of the item after some time) Andrea Mark Diggory ha scritto: On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote: Larry Stone ha scritto: Collection * - * Item It's worth noting that while an Item may be a member of multiple Collections, it still refers to only one of them as its owner; it is returned by getOwningCollection(). true but IMHO this is not really needed... Well, what I've suggested in my previous email is not wether it is needed or not, but where it should correctly reside in relational terms, owner is a relationship, not an attribute of Item and Collection, thus the more appropriate location would be in the container or relationship tables. I.E. rather than: --- -- Item table --- CREATE TABLE Item ( item_id INTEGER PRIMARY KEY, submitter_idINTEGER REFERENCES EPerson(eperson_id), in_archive BOOL, withdrawn BOOL, last_modified TIMESTAMP WITH TIME ZONE, owning_collection INTEGER ); --- -- Collection2Item table --- CREATE TABLE Collection2Item ( idINTEGER PRIMARY KEY, collection_id INTEGER REFERENCES Collection(collection_id), item_id INTEGER REFERENCES Item(item_id), ); instead have --- -- Item table --- CREATE TABLE Item ( item_id INTEGER PRIMARY KEY, submitter_idINTEGER REFERENCES EPerson(eperson_id), in_archive BOOL, withdrawn BOOL, last_modified TIMESTAMP WITH TIME ZONE ); --- -- Collection2Item table --- CREATE TABLE Collection2Item ( idINTEGER PRIMARY KEY, collection_id INTEGER REFERENCES Collection(collection_id), item_id INTEGER REFERENCES Item(item_id), owning_collection BOOL ); Ownership is a relationship and not a attribute and the dependency is one way. This also is an example that enforces third normal form because you cannot have a owning collection for which the item is not a member. (Although you can have multiple owners). When an Item is accessed directly, by itself without the navigational context of one of the Collections it belongs to, it consults the owning Collection for display style this is only one possibility and it is not the most useful (see MedataStyleSelection in 1.5) Very true and we will see that Manakin will mix this up even more. and policies (e.g. access control by Collection admins). the auth system need a lot of work, in my path community admin that introduce some hierarchy control I have used the owning collection as the only real parent i.e. if I want modify the item but I have not direct permission I check for ADMIN right on owning collection... this is not optimal, if we have an item mapped in an other collection I think that only directly authorized people or ADMIN of both collection should manage it For Daniele works I recommend to keep ownCollection in place but I think that we need to remove it in future version. PS:the concept of owning collection is used also in workflow and submission system but there is a main difference: the data are stored in inProgressSubmission object not in the item object. Also in this case I hope that we can introduce a more modular way to select workflow process and submission process then simply use the owning collection. Andrea Andrea, that is another example of what I speak of above. At least in this case, we see
Re: [Dspace-tech] Differences between the data model and the trunk
On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote: Larry Stone ha scritto: Collection * - * Item It's worth noting that while an Item may be a member of multiple Collections, it still refers to only one of them as its owner; it is returned by getOwningCollection(). true but IMHO this is not really needed... Well, what I've suggested in my previous email is not wether it is needed or not, but where it should correctly reside in relational terms, owner is a relationship, not an attribute of Item and Collection, thus the more appropriate location would be in the container or relationship tables. I.E. rather than: --- -- Item table --- CREATE TABLE Item ( item_id INTEGER PRIMARY KEY, submitter_idINTEGER REFERENCES EPerson(eperson_id), in_archive BOOL, withdrawn BOOL, last_modified TIMESTAMP WITH TIME ZONE, owning_collection INTEGER ); --- -- Collection2Item table --- CREATE TABLE Collection2Item ( idINTEGER PRIMARY KEY, collection_id INTEGER REFERENCES Collection(collection_id), item_id INTEGER REFERENCES Item(item_id), ); instead have --- -- Item table --- CREATE TABLE Item ( item_id INTEGER PRIMARY KEY, submitter_idINTEGER REFERENCES EPerson(eperson_id), in_archive BOOL, withdrawn BOOL, last_modified TIMESTAMP WITH TIME ZONE ); --- -- Collection2Item table --- CREATE TABLE Collection2Item ( idINTEGER PRIMARY KEY, collection_id INTEGER REFERENCES Collection(collection_id), item_id INTEGER REFERENCES Item(item_id), owning_collection BOOL ); Ownership is a relationship and not a attribute and the dependency is one way. This also is an example that enforces third normal form because you cannot have a owning collection for which the item is not a member. (Although you can have multiple owners). When an Item is accessed directly, by itself without the navigational context of one of the Collections it belongs to, it consults the owning Collection for display style this is only one possibility and it is not the most useful (see MedataStyleSelection in 1.5) Very true and we will see that Manakin will mix this up even more. and policies (e.g. access control by Collection admins). the auth system need a lot of work, in my path community admin that introduce some hierarchy control I have used the owning collection as the only real parent i.e. if I want modify the item but I have not direct permission I check for ADMIN right on owning collection... this is not optimal, if we have an item mapped in an other collection I think that only directly authorized people or ADMIN of both collection should manage it For Daniele works I recommend to keep ownCollection in place but I think that we need to remove it in future version. PS:the concept of owning collection is used also in workflow and submission system but there is a main difference: the data are stored in inProgressSubmission object not in the item object. Also in this case I hope that we can introduce a more modular way to select workflow process and submission process then simply use the owning collection. Andrea Andrea, that is another example of what I speak of above. At least in this case, we see that inProgressSubmission (or WorkspaceItem) is really a container like collection and we've attached the item to that rather than attaching it to the item. I just think this is much cleaner and actually does not require altering the item table to move an item between WorkspaceItem and WorkflowItem when it is moved from Submission to Management Workflows. I would support moving/or removing owning_collection if it improved the model and the ability to work with tools. I think we would want a path of deprecation however, and using the above relationship approach could more easily give us that. -Mark - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Differences between the data model and the trunk
There are some acknowledged discrepancies between the data model, the API, and the capabilities afforded by the UI. I'm not well placed to discuss the decisions made in the early development of DSpace, but I think the idea was to allow for many-to-many relationships in the model, while being more restrictive in the API/UI mostly for pragmatic reasons (basically, it was really hard / problematic to reflect the many-to-many relationship all the way up the stack). There are all sorts of implications of an object having multiple parents, the most obvious being authorization policies, but I'm sure there are more. Perhaps someone else (Rob?) can provide some insight here. cheers, Jim daniele.ninfo wrote: Hi all, I'm writing to underline some differences between the data model shown in the dspace site ( http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model ) and the actual code present in the /trunk ( http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ). There are differences in the associations between objects of the model: some one-to-many associations in the model are replaced in the code by many-to-many ones. I found 3 examples: - In the model, a Collection can be owned by only one Community, whereas in the code a Collection can be owned by many Community. - In the model, a Bundle can be owned by only one Item, and in the code it can be owned by more then one Item - In the model, a Bitstream can be owned by only one Bundle, and in the code by more then one. In the database schema, all the associations are many-to-many, there are no restrictions. I also looked at Jim's DAO-prototype ( http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he reflects the database schema, using only many-to-many associations. I'm working on a prototype to introduce Hibernate, and i'd base my work on Jim's choice, but i'd like to have some opinions about it, what should we do? update the data model associations? follow the model and update the code in /trunk? what kind of associations should be kept and what dropped? My mentor Andrea will write his opinions replying to this email, to let everyone know them. We both are interested in having other opinions to choose the best way to go ahead. Cheers, Daniele - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- James Rutherford | Hewlett-Packard Limited registered Office: Research Engineer | Cain Road, HP Labs | Bracknell, Bristol, UK | Berks +44 117 312 7066 | RG12 1HN. [EMAIL PROTECTED] | Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Differences between the data model and the trunk
The mismatch is very old... I think that the API is the most right source of info, I assume that all of us are happy with many-to-many associations for communities (also with themselves), collections items. I'm not sure if we want keep a many-to-many for item-bundle-bitstream. In my opinion these are only one-to-many and we should change the API (and db) in this way. I know that the Jim work about versioning uses the many-to-many association for item bundle to share some of them between different versions, I hope that we can change this to share only the physical location of the bitstreams. In this way should be more simple to keep different versions that change only metadata of a bitstream. Community * - * Community Community * - * Collection Collection * - * Item Item 1 - * Bundle Bundle 1 - * Bitstrea Best, Andrea daniele.ninfo ha scritto: Hi all, I'm writing to underline some differences between the data model shown in the dspace site ( http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model ) and the actual code present in the /trunk ( http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ). There are differences in the associations between objects of the model: some one-to-many associations in the model are replaced in the code by many-to-many ones. I found 3 examples: - In the model, a Collection can be owned by only one Community, whereas in the code a Collection can be owned by many Community. - In the model, a Bundle can be owned by only one Item, and in the code it can be owned by more then one Item - In the model, a Bitstream can be owned by only one Bundle, and in the code by more then one. In the database schema, all the associations are many-to-many, there are no restrictions. I also looked at Jim's DAO-prototype ( http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he reflects the database schema, using only many-to-many associations. I'm working on a prototype to introduce Hibernate, and i'd base my work on Jim's choice, but i'd like to have some opinions about it, what should we do? update the data model associations? follow the model and update the code in /trunk? what kind of associations should be kept and what dropped? My mentor Andrea will write his opinions replying to this email, to let everyone know them. We both are interested in having other opinions to choose the best way to go ahead. Cheers, Daniele - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Dott. Andrea Bollini Responsabile tecnico sviluppo e formazione applicativi JAVA Sezione Servizi per le Biblioteche e l'Editoria Elettronica CILEA, http://www.cilea.it tel. +39 06-59292831 cel. +39 348-8277525 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Differences between the data model and the trunk
Hi all, I'm writing to underline some differences between the data model shown in the dspace site ( http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model ) and the actual code present in the /trunk ( http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ). There are differences in the associations between objects of the model: some one-to-many associations in the model are replaced in the code by many-to-many ones. I found 3 examples: - In the model, a Collection can be owned by only one Community, whereas in the code a Collection can be owned by many Community. - In the model, a Bundle can be owned by only one Item, and in the code it can be owned by more then one Item - In the model, a Bitstream can be owned by only one Bundle, and in the code by more then one. In the database schema, all the associations are many-to-many, there are no restrictions. I also looked at Jim's DAO-prototype ( http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he reflects the database schema, using only many-to-many associations. I'm working on a prototype to introduce Hibernate, and i'd base my work on Jim's choice, but i'd like to have some opinions about it, what should we do? update the data model associations? follow the model and update the code in /trunk? what kind of associations should be kept and what dropped? My mentor Andrea will write his opinions replying to this email, to let everyone know them. We both are interested in having other opinions to choose the best way to go ahead. Cheers, Daniele - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech