On Nov 21, 2007, at 2:44 AM, Larry Stone wrote: > This is really getting out of scope for dspace-tech,
But its important to discuss direction transparently, so I do not consider it out of scope. > but I'd just like > to make a plea to look at the data model in the abstract rather than > at the implementation level: the way it appears in database tables > *doesn't* > *matter* at this stage of thinking about it, and I think it muddies > the waters > even to talk about them. No, this is the dilemma, it really doesn't matter what we talk of conceptually, the only "real" thing is what is implemented. Everything else is a speculative attempt to describe what really exists as the implementation of the DSpace storage solution. Per, DSpace 2.0, yes we will have an abstract model, but it should be rooted in reality, in what can actually be accomplished with the existing storage solution a.k.a a relational database. > > There are objects, which perhaps have both attributes and > relationships; > that's the abstract way to discuss it. It is inconsequential whether > attributes are implemented as columns in a table and relationships are > an RDF triplestore -- what matters is the abstract model. I would find it overly complex and dismaying if the result of the 2.0 re-architecture could not be expressed in simple relational terms. > That said, I notice there is a tendancy to add, or want to add, > lots of > different kinds of relationships to the data model. For example, an > Item has an "owning" collection (or several) for the purpose of access > control, and perhaps a different parent for UI appearance and yet > another for navigation. Those could be typed relationships. There > was > some discussion of this in > http://wiki.dspace.org/index.php/BitstreamRelationships too. I'm actually, attempting just to clarify what "really" exists in the existing implementation of our data model. We currently do have that relationship present in the schema. Thus, either that data model documentation is really out of date, or the column shouldn't exist in the schema implementing that model. > Perhaps we could benefit from a very general relationship model that > lets the API client create typed relationships between *any* DSOs, > but of course it would need to enforce rules as well: > - acceptable domain and range of each kind of relationship operator > - schema restrictions on relationships (e.g. one-to-many, one-to-one) > - access control on the relationships themselves. Sure, and we will ultimately return to how these would be expressed in the schema. And I predict ultimately once proper normalization and exclusion of "non-model" properties occurred in the database schema, the above would most certainly exist. > All DSpace Objects would inherit some common traits, e.g. an > identifier > unique among all DSOs, and this mechanism that manages relationships > between any DSOs. The mechanism implements all the schema > restrictions > and policies. We already did all that at the Architectural review and the DAO prototype. > > ...but that's what I mean about keeping the discussion abstract: I'm > not going to say if the "relationship" is really "an RDF statement" > no matter how much it looks like one.. Let's just look at the problem > without getting boxed into a particular solution. Well, if you get too abstract, then nothing gets done, as well, developers loose interest and threads die... We have a particular solution right now. My use of "RDF or "relational" terms isn't to box us in, but to begin to draw out what is analogous across these technologies, and thus where there is true abstraction, not abstraction for abstractions sake. Cheers, Mark > > -- Larry > >> On Nov 20, 2007, at 3:55 AM, Andrea Bollini wrote: >>> Larry Stone ha scritto: >>>>> Collection * - * Item >>>>> >>>> >>>> It's worth noting that while an Item may be a member of multiple >>>> Collections, it still refers to only one of them as its "owner"; >>>> it is >>>> returned by getOwningCollection(). >>> true but IMHO this is not really needed... >> >> Well, what I've suggested in my previous email is not wether it is >> needed or not, but where it should correctly reside in relational >> terms, "owner" is a relationship, not an attribute of Item and >> Collection, thus the more appropriate location would be in the >> container or relationship tables. I.E. >> >> rather than: >> >>> ------------------------------------------------------- >>> -- Item table >>> ------------------------------------------------------- >>> CREATE TABLE Item >>> ( >>> item_id INTEGER PRIMARY KEY, >>> submitter_id INTEGER REFERENCES EPerson(eperson_id), >>> in_archive BOOL, >>> withdrawn BOOL, >>> last_modified TIMESTAMP WITH TIME ZONE, >>> owning_collection INTEGER >>> ); >> >>> ------------------------------------------------------- >>> -- Collection2Item table >>> ------------------------------------------------------- >>> CREATE TABLE Collection2Item >>> ( >>> id INTEGER PRIMARY KEY, >>> collection_id INTEGER REFERENCES Collection(collection_id), >>> item_id INTEGER REFERENCES Item(item_id), >>> ); >> >> instead have >> >>> ------------------------------------------------------- >>> -- Item table >>> ------------------------------------------------------- >>> CREATE TABLE Item >>> ( >>> item_id INTEGER PRIMARY KEY, >>> submitter_id INTEGER REFERENCES EPerson(eperson_id), >>> in_archive BOOL, >>> withdrawn BOOL, >>> last_modified TIMESTAMP WITH TIME ZONE >>> ); >> >> >>> ------------------------------------------------------- >>> -- Collection2Item table >>> ------------------------------------------------------- >>> CREATE TABLE Collection2Item >>> ( >>> id INTEGER PRIMARY KEY, >>> collection_id INTEGER REFERENCES Collection(collection_id), >>> item_id INTEGER REFERENCES Item(item_id), >>> owning_collection BOOL >>> ); >> >> Ownership is a relationship and not a attribute and the dependency is >> one way. This also is an example that enforces third normal form >> because you cannot have a owning collection for which the item is not >> a member. (Although you can have multiple owners). >> >> >>>> When an Item is accessed directly, >>>> by itself without the navigational context of one of the >>>> Collections it belongs to, it consults the owning Collection for >>>> display style >>> this is only one possibility and it is not the most useful (see >>> MedataStyleSelection in 1.5) >> >> Very true and we will see that Manakin will mix this up even more. >> >>>> and policies (e.g. access control by Collection admins). >>>> >>> the auth system need a lot of work, in my path "community admin" >>> that >>> introduce some hierarchy control I have used the owning >>> collection as >>> the only real parent i.e. if I want modify the item but I have not >>> direct permission I check for ADMIN right on owning collection... >>> this is not optimal, if we have an item mapped in an other >>> collection I >>> think that only directly authorized people or ADMIN of both >>> collection >>> should manage it >>> For Daniele works I recommend to keep ownCollection in place but I >>> think >>> that we need to remove it in future version. >>> >>> PS:the concept of owning collection is used also in workflow and >>> submission system but there is a main difference: the data are >>> stored in >>> inProgressSubmission object not in the item object. Also in this >>> case I >>> hope that we can introduce a more modular way to select workflow >>> process >>> and submission process then simply use the "owning collection". >>> Andrea >> >> Andrea, that is another example of what I speak of above. At least in >> this case, we see that inProgressSubmission (or WorkspaceItem) is >> really a "container" like collection and we've attached the item to >> that rather than attaching it to the item. I just think this is much >> cleaner and actually does not require altering the item table to move >> an item between WorkspaceItem and WorkflowItem when it is moved from >> Submission to Management Workflows. >> >> I would support moving/or removing owning_collection if it improved >> the model and the ability to work with tools. I think we would want >> a path of deprecation however, and using the above "relationship" >> approach could more easily give us that. >> -Mark >> >> >> --------------------------------------------------------------------- >> ---- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

