Re: [Dspace-tech] Differences between the data model and the trunk

2007-11-26 Thread Mark Diggory

On Nov 21, 2007, at 2:44 AM, Larry Stone wrote:

 This is really getting out of scope for dspace-tech,

But its important to discuss direction transparently, so I do not  
consider it out of scope.

 but I'd just like
 to make a plea to look at the data model in the abstract rather than
 at the implementation level: the way it appears in database tables  
 *doesn't*
 *matter* at this stage of thinking about it, and I think it muddies  
 the waters
 even to talk about them.

No, this is the dilemma, it really doesn't matter what we talk of  
conceptually, the only real thing is what is implemented.   
Everything else is a speculative attempt to describe what really  
exists as the implementation of the DSpace storage solution.  Per,  
DSpace 2.0, yes we will have an abstract model, but it should be  
rooted in reality, in what can actually be accomplished with the  
existing storage solution a.k.a a relational database.


 There are objects, which perhaps have both attributes and  
 relationships;
 that's the abstract way to discuss it.  It is inconsequential whether
 attributes are implemented as columns in a table and relationships are
 an RDF triplestore -- what matters is the abstract model.

I would find it overly complex and dismaying if the result of the 2.0  
re-architecture could not be expressed in simple relational terms.

 That said, I notice there is a tendancy to add, or want to add,  
 lots of
 different kinds of relationships to the data model.  For example, an
 Item has an owning collection (or several) for the purpose of access
 control, and perhaps a different parent for UI appearance and yet
 another for navigation.  Those could be typed relationships.  There  
 was
 some discussion of this in
 http://wiki.dspace.org/index.php/BitstreamRelationships too.

I'm actually, attempting just to clarify what really exists in the  
existing implementation of our data model.  We currently do have that  
relationship present in the schema. Thus, either that data model  
documentation is really out of date, or the column shouldn't exist in  
the  schema implementing that model.

 Perhaps we could benefit from a very general relationship model that
 lets the API client create typed relationships between *any* DSOs,
 but of course it would need to enforce rules as well:
 - acceptable domain and range of each kind of relationship operator
 - schema restrictions on relationships (e.g. one-to-many, one-to-one)
 - access control on the relationships themselves.

Sure, and we will ultimately return to how these would be expressed  
in the schema. And I predict ultimately once proper normalization and  
exclusion of non-model properties occurred in the database schema,  
the above would most certainly exist.

 All DSpace Objects would inherit some common traits, e.g. an  
 identifier
 unique among all DSOs, and this mechanism that manages relationships
 between any DSOs.  The mechanism implements all the schema  
 restrictions
 and policies.

We already did all that at the Architectural review and the DAO  
prototype.


 ...but that's what I mean about keeping the discussion abstract: I'm
 not going to say if the relationship is really an RDF statement
 no matter how much it looks like one..  Let's just look at the problem
 without getting boxed into a particular solution.

Well, if you get too abstract, then nothing gets done, as well,  
developers loose interest and threads die...  We have a particular  
solution right now. My use of RDF or relational terms isn't to box  
us in, but to begin to draw out what is analogous across these  
technologies, and thus where there is true abstraction, not  
abstraction for abstractions sake.

Cheers,
Mark


 -- Larry

 On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote:
 Larry Stone ha scritto:
 Collection * - * Item


 It's worth noting that while an Item may be a member of multiple
 Collections, it still refers to only one of them as its owner;
 it is
 returned by getOwningCollection().
 true but IMHO this is not really needed...

 Well, what I've suggested in my previous email is not wether it is
 needed or not, but where it should correctly reside in relational
 terms, owner is a relationship, not an attribute of Item and
 Collection, thus the more appropriate location would be in the
 container or relationship tables. I.E.

 rather than:

 ---
 -- Item table
 ---
 CREATE TABLE Item
 (
   item_id INTEGER PRIMARY KEY,
   submitter_idINTEGER REFERENCES EPerson(eperson_id),
   in_archive  BOOL,
   withdrawn   BOOL,
   last_modified   TIMESTAMP WITH TIME ZONE,
   owning_collection INTEGER
 );

 ---
 -- Collection2Item table
 ---
 CREATE TABLE Collection2Item
 (
   idINTEGER PRIMARY KEY,
   collection_id INTEGER REFERENCES 

Re: [Dspace-tech] Differences between the data model and the trunk

2007-11-22 Thread Andrea Bollini
Mark,
I have understanding you opinion but I'm not really sure that the owning
collection is a relationship in the actual model or that we need a
similar relationship in future development.
At the moment we need owning collection only for keep simple some
presentation (display style, bread crumbs)  authorization choice,
looking forward I think that
all collections where the item is mapped really own the item, the item
mapper feature should check not only for ADD permission on the target
collection but also for ADMIN permission on the item self.
Presentation issue should be resolved in their domain: bread crumbs
should show the path used by the user for get the item, display should
be based only on the item metadata  so.
I propose to rename OwningCollection in SubmittedCollection to keep live
this info (as for submitter) also after that the inProgressSubmission
became an Archived Item.
With this approach SubmittedCollection has to be considerate an item
attribute and not a relationship... (the SubmittedCollection could be
also not an own of the item after some time)
Andrea


Mark Diggory ha scritto:
 On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote:

   
 Larry Stone ha scritto:
 
 Collection * - * Item

 
 It's worth noting that while an Item may be a member of multiple
 Collections, it still refers to only one of them as its owner;  
 it is
 returned by getOwningCollection().
   
 true but IMHO this is not really needed...
 

 Well, what I've suggested in my previous email is not wether it is  
 needed or not, but where it should correctly reside in relational  
 terms, owner is a relationship, not an attribute of Item and  
 Collection, thus the more appropriate location would be in the  
 container or relationship tables. I.E.

 rather than:

   
 ---
 -- Item table
 ---
 CREATE TABLE Item
 (
   item_id INTEGER PRIMARY KEY,
   submitter_idINTEGER REFERENCES EPerson(eperson_id),
   in_archive  BOOL,
   withdrawn   BOOL,
   last_modified   TIMESTAMP WITH TIME ZONE,
   owning_collection INTEGER
 );
 

   
 ---
 -- Collection2Item table
 ---
 CREATE TABLE Collection2Item
 (
   idINTEGER PRIMARY KEY,
   collection_id INTEGER REFERENCES Collection(collection_id),
   item_id   INTEGER REFERENCES Item(item_id),
 );
 

 instead have

   
 ---
 -- Item table
 ---
 CREATE TABLE Item
 (
   item_id INTEGER PRIMARY KEY,
   submitter_idINTEGER REFERENCES EPerson(eperson_id),
   in_archive  BOOL,
   withdrawn   BOOL,
   last_modified   TIMESTAMP WITH TIME ZONE
 );
 


   
 ---
 -- Collection2Item table
 ---
 CREATE TABLE Collection2Item
 (
   idINTEGER PRIMARY KEY,
   collection_id INTEGER REFERENCES Collection(collection_id),
   item_id   INTEGER REFERENCES Item(item_id),
   owning_collection  BOOL
 );
 

 Ownership is a relationship and not a attribute and the dependency is  
 one way. This also is an example that enforces third normal form  
 because you cannot have a owning collection for which the item is not  
 a member. (Although you can have multiple owners).


   
 When an Item is accessed directly,
 by itself without the navigational context of one of the
 Collections it belongs to, it consults the owning Collection for
 display style
   
 this is only one possibility and it is not the most useful (see
 MedataStyleSelection in 1.5)
 

 Very true and we will see that Manakin will mix this up even more.

   
  and policies (e.g. access control by Collection admins).

   
 the auth system need a lot of work, in my path community admin that
 introduce some hierarchy control I have used the owning collection as
 the only real parent i.e. if I want modify the item but I have not
 direct permission I check for ADMIN right on owning collection...
 this is not optimal, if we have an item mapped in an other  
 collection I
 think that only directly authorized people or ADMIN of both collection
 should manage it
 For Daniele works I recommend to keep ownCollection in place but I  
 think
 that we need to remove it in future version.

 PS:the concept of  owning collection is used also in workflow and
 submission system but there is a main difference: the data are  
 stored in
 inProgressSubmission object not in the item object. Also in this  
 case I
 hope that we can introduce a more modular way to select workflow  
 process
 and submission process then simply use the owning collection.
 Andrea
 

 Andrea, that is another example of what I speak of above. At least in  
 this case, we see 

Re: [Dspace-tech] Differences between the data model and the trunk

2007-11-20 Thread Mark Diggory

On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote:

 Larry Stone ha scritto:
 Collection * - * Item


 It's worth noting that while an Item may be a member of multiple
 Collections, it still refers to only one of them as its owner;  
 it is
 returned by getOwningCollection().
 true but IMHO this is not really needed...

Well, what I've suggested in my previous email is not wether it is  
needed or not, but where it should correctly reside in relational  
terms, owner is a relationship, not an attribute of Item and  
Collection, thus the more appropriate location would be in the  
container or relationship tables. I.E.

rather than:

 ---
 -- Item table
 ---
 CREATE TABLE Item
 (
   item_id INTEGER PRIMARY KEY,
   submitter_idINTEGER REFERENCES EPerson(eperson_id),
   in_archive  BOOL,
   withdrawn   BOOL,
   last_modified   TIMESTAMP WITH TIME ZONE,
   owning_collection INTEGER
 );

 ---
 -- Collection2Item table
 ---
 CREATE TABLE Collection2Item
 (
   idINTEGER PRIMARY KEY,
   collection_id INTEGER REFERENCES Collection(collection_id),
   item_id   INTEGER REFERENCES Item(item_id),
 );

instead have

 ---
 -- Item table
 ---
 CREATE TABLE Item
 (
   item_id INTEGER PRIMARY KEY,
   submitter_idINTEGER REFERENCES EPerson(eperson_id),
   in_archive  BOOL,
   withdrawn   BOOL,
   last_modified   TIMESTAMP WITH TIME ZONE
 );


 ---
 -- Collection2Item table
 ---
 CREATE TABLE Collection2Item
 (
   idINTEGER PRIMARY KEY,
   collection_id INTEGER REFERENCES Collection(collection_id),
   item_id   INTEGER REFERENCES Item(item_id),
   owning_collection   BOOL
 );

Ownership is a relationship and not a attribute and the dependency is  
one way. This also is an example that enforces third normal form  
because you cannot have a owning collection for which the item is not  
a member. (Although you can have multiple owners).


 When an Item is accessed directly,
 by itself without the navigational context of one of the
 Collections it belongs to, it consults the owning Collection for
 display style
 this is only one possibility and it is not the most useful (see
 MedataStyleSelection in 1.5)

Very true and we will see that Manakin will mix this up even more.

  and policies (e.g. access control by Collection admins).

 the auth system need a lot of work, in my path community admin that
 introduce some hierarchy control I have used the owning collection as
 the only real parent i.e. if I want modify the item but I have not
 direct permission I check for ADMIN right on owning collection...
 this is not optimal, if we have an item mapped in an other  
 collection I
 think that only directly authorized people or ADMIN of both collection
 should manage it
 For Daniele works I recommend to keep ownCollection in place but I  
 think
 that we need to remove it in future version.

 PS:the concept of  owning collection is used also in workflow and
 submission system but there is a main difference: the data are  
 stored in
 inProgressSubmission object not in the item object. Also in this  
 case I
 hope that we can introduce a more modular way to select workflow  
 process
 and submission process then simply use the owning collection.
 Andrea

Andrea, that is another example of what I speak of above. At least in  
this case, we see that inProgressSubmission (or WorkspaceItem) is  
really a container like collection and we've attached the item to  
that rather than attaching it to the item.  I just think this is much  
cleaner and actually does not require altering the item table to move  
an item between WorkspaceItem and WorkflowItem when it is moved from  
Submission to Management Workflows.

I would support moving/or removing owning_collection if it improved  
the model and the ability to work with tools.  I think we would want  
a path of deprecation however, and using the above relationship  
approach could more easily give us that.
-Mark


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Differences between the data model and the trunk

2007-11-19 Thread James Rutherford
There are some acknowledged discrepancies between the data model, the
API, and the capabilities afforded by the UI. I'm not well placed to
discuss the decisions made in the early development of DSpace, but I
think the idea was to allow for many-to-many relationships in the model,
while being more restrictive in the API/UI mostly for pragmatic reasons
(basically, it was really hard / problematic to reflect the many-to-many
relationship all the way up the stack). There are all sorts of
implications of an object having multiple parents, the most obvious
being authorization policies, but I'm sure there are more.

Perhaps someone else (Rob?) can provide some insight here.

cheers,

Jim

daniele.ninfo wrote:
  Hi all,
 
  I'm writing to underline some differences between the data model shown in 
  the 
dspace site
  ( 
http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model 
) 
and the actual code present in the /trunk ( 
http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ).
 
  There are differences in the associations between objects of the model: some 
one-to-many associations in the model are replaced in the code by many-to-many 
ones. I found 3 examples:
 
  - In the model, a Collection can be owned by only one Community, whereas in 
the code a Collection can be owned by many Community.
  - In the model, a Bundle can be owned by only one Item, and in the code it 
can be owned by more then one Item
  - In the model, a Bitstream can be owned by only one Bundle, and in the code 
by more then one.
 
  In the database schema, all the associations are many-to-many, there are no 
restrictions.
  I also looked at Jim's DAO-prototype ( 
http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he 
reflects the database schema, using only many-to-many associations.
 
  I'm working on a prototype to introduce Hibernate, and i'd base my work on 
Jim's choice, but i'd like to have some opinions about it, what should we do? 
update the data model associations? follow the model and update the code in 
/trunk? what kind of associations should be kept and what dropped?
 
  My mentor Andrea will write his opinions replying to this email, to let 
everyone know them. We both are interested in having other opinions to choose 
the best way to go ahead.
 
  Cheers,
  Daniele
 
 
  -
  This SF.net email is sponsored by: Microsoft
  Defy all challenges. Microsoft(R) Visual Studio 2005.
  http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech

-- 
James Rutherford  |  Hewlett-Packard Limited registered Office:
Research Engineer |  Cain Road,
HP Labs   |  Bracknell,
Bristol, UK   |  Berks
+44 117 312 7066  |  RG12 1HN.
[EMAIL PROTECTED]   |  Registered No: 690597 England

The contents of this message and any attachments to it are confidential and
may be legally privileged. If you have received this message in error, you
should delete it from your system immediately and advise the sender. To any
recipient of this message within HP, unless otherwise stated you should
consider this message and attachments as HP CONFIDENTIAL.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Differences between the data model and the trunk

2007-11-19 Thread Andrea Bollini
The mismatch is very old...
I think that the API is the most right source of info, I assume that
all of us are happy with many-to-many associations for communities (also
with themselves), collections  items. I'm not sure if we want keep a
many-to-many for item-bundle-bitstream. In my opinion these are only
one-to-many and we should change the API (and db) in this way. I know
that the Jim work about versioning uses the many-to-many association for
item  bundle to share some of them between different versions, I hope
that we can change this to share only the physical location of the
bitstreams. In this way should be more simple to keep different versions
that change only metadata of a bitstream.

Community * - * Community
Community * - * Collection
Collection * - * Item
Item 1 - * Bundle
Bundle 1 - * Bitstrea

Best,
Andrea

daniele.ninfo ha scritto:
 Hi all,

 I'm writing to underline some differences between the data model shown in the 
 dspace site 
 ( 
 http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model
  ) and the actual code present in the /trunk ( 
 http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ).

 There are differences in the associations between objects of the model: some 
 one-to-many associations in the model are replaced in the code by 
 many-to-many ones. I found 3 examples:

 - In the model, a Collection can be owned by only one Community, whereas in 
 the code a Collection can be owned by many Community.
 - In the model, a Bundle can be owned by only one Item, and in the code it 
 can be owned by more then one Item
 - In the model, a Bitstream can be owned by only one Bundle, and in the code 
 by more then one.

 In the database schema, all the associations are many-to-many, there are no 
 restrictions.
 I also looked at Jim's DAO-prototype ( 
 http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he 
 reflects the database schema, using only many-to-many associations.

 I'm working on a prototype to introduce Hibernate, and i'd base my work on 
 Jim's choice, but i'd like to have some opinions about it, what should we do? 
 update the data model associations? follow the model and update the code in 
 /trunk? what kind of associations should be kept and what dropped?

 My mentor Andrea will write his opinions replying to this email, to let 
 everyone know them. We both are interested in having other opinions to choose 
 the best way to go ahead.

 Cheers,
 Daniele


 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2005.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


   


-- 
Dott. Andrea Bollini
Responsabile tecnico sviluppo e formazione applicativi JAVA
Sezione Servizi per le Biblioteche e l'Editoria Elettronica
CILEA, http://www.cilea.it
tel. +39 06-59292831  cel. +39 348-8277525



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Differences between the data model and the trunk

2007-11-16 Thread daniele.ninfo
Hi all,

I'm writing to underline some differences between the data model shown in the 
dspace site 
( 
http://www.dspace.org/index.php?option=com_contenttask=viewid=149#data_model 
) and the actual code present in the /trunk ( 
http://dspace-sandbox.googlecode.com/svn/mirror/dspace/trunk/ ).

There are differences in the associations between objects of the model: some 
one-to-many associations in the model are replaced in the code by many-to-many 
ones. I found 3 examples:

- In the model, a Collection can be owned by only one Community, whereas in the 
code a Collection can be owned by many Community.
- In the model, a Bundle can be owned by only one Item, and in the code it can 
be owned by more then one Item
- In the model, a Bitstream can be owned by only one Bundle, and in the code by 
more then one.

In the database schema, all the associations are many-to-many, there are no 
restrictions.
I also looked at Jim's DAO-prototype ( 
http://dspace-sandbox.googlecode.com/svn/branches/dao-prototype/ ), and he 
reflects the database schema, using only many-to-many associations.

I'm working on a prototype to introduce Hibernate, and i'd base my work on 
Jim's choice, but i'd like to have some opinions about it, what should we do? 
update the data model associations? follow the model and update the code in 
/trunk? what kind of associations should be kept and what dropped?

My mentor Andrea will write his opinions replying to this email, to let 
everyone know them. We both are interested in having other opinions to choose 
the best way to go ahead.

Cheers,
Daniele


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech