My gut feel is with approach #2 on this also. -----Original Message----- From: cwil...@gmail.com [mailto:cwil...@gmail.com] On Behalf Of Chris Wilper Sent: 13 July 2009 16:00 To: Steve Bayliss Cc: Daniel Davis; Asger Blekinge-Rasmussen; Schwichtenberg, Frank; fedora-commons-developers@lists.sourceforge.net Subject: Re: [Fedora-commons-developers] How RELS-INT breaks the Fedora paradigm and opens the door for new and innovative solutions to old problems
I continue to support committing this long-awaited feature to trunk. Andrew, Bill, and I have reviewed the code in the RELS-INT branch. I've reviewed the functionality (via automated and manual tests), and it works as advertised: It allows you to make statements/assert properties about datastreams, and have them queryable in the resource index if it's enabled. RELS-INT seems like a fine abbreviation for "RELations/properties emanating from INTernal parts of the object" to me. This is also the term the community has been using (and some people have already implemented locally) when discussing this capability. Regarding disseminations and encapsulation, I think we need to continue giving people the choice on whether they want to use Fedora as a simple store of data objects, or to employ service definitions and deployments to provide additional views and functions on their data. Regardless of the approach you take, it turns out that it is still useful to make statements about datastreams. Thus, RELS-INT. However, to make this work I think we need to be clear on what URIs like info:fedora/some:pid/DSID denote. Currently we're not. As it stands, this form of info:fedora URI is officially defined as identifying a datastream dissemination[1]. But if you ask people familiar with Fedora what they think that URI refers to, the vast majority will say it denotes a datastream. I think that should tell us something. I can see a few ways forward with this: 1) Create a new kind of info:fedora URI that denotes a datastream, and keep the existing official definition as-is. 2) Redefine info:fedora/some:pid/DSID to denote a datastream, not a datastream dissemination. 3) Expand the definition to say that it identifies both. I think introducing a new identifier as #1 suggests is likely to result in confusion (unless people carefully read documentation, which they don't). While "redefine" in #2 sounds scary, I have a hard time imagining how this would actually cause a problem in reality. #3 might sound tempting, but goes against the webarch idea that a URI denotes one thing. Thoughts? - Chris [1] http://www.fedora-commons.org/confluence/display/FCR30/Fedora+Identifiers On Mon, Jul 13, 2009 at 4:47 AM, Steve Bayliss<stephen.bayl...@acuityunlimited.net> wrote: > I can see that the distinction between exposing "public" and "private" > parts of the information model is important. > > At the moment, datastreams are publically exposed, and at the moment > some datastream properties (mime-type, last modified date, etc) are > indexed in the resource index. > > So RELS-INT isn't out of line with where we are. And maybe the name > RELS-INT for internal relationships is appropriate after all. > > If we move to a dissemination-only situation, where datastreams are > only internal, we may want to rethink this, and make RELS-INT triples > (and current datastream properties) only accessible internally? > > There are current use-cases for RELS-INT - for instance controlling > access to parts of an object, eg where thumbnail images can be seen by > all, but only users with a certain role are allowed to see the full > size image. There could be cases where low resolution images get seen > by everyone, but high resolution images (large file sizes) are > restricted due to bandwidth constraints. > > Steve > > > > > > -----Original Message----- > From: Daniel Davis [mailto:dda...@fedora-commons.org] > Sent: 10 July 2009 15:23 > To: Asger Blekinge-Rasmussen > Cc: Schwichtenberg, Frank; > fedora-commons-developers@lists.sourceforge.net > Subject: Re: [Fedora-commons-developers] How RELS-INT breaks the Fedora > paradigm and opens the door for new and innovative solutions to old problems > > I am added my remarks in the text. > > Daniel W. Davis > dda...@fedora-commons.org > > > > Asger Blekinge-Rasmussen wrote: > > Hi > > I am not sure I understood your post entirely. I have inlined some > questions on the parts I find unclear. > > On Mon, 2009-07-06 at 17:15 +0200, Daniel Davis wrote: > > > I am concerned that the Fedora persistence/service model and the > global information model are being conflated. A public API is needed > to facilitate persistence operations but it is easy in Fedora to > overlap the persistence/service model with the public information > model which is intended to suitable for use in graphs and in the Web > architecture. Extending Fedora with writable service methods may make > the separation more clear but for now it is easy to conflate the > global information model and the persistence/service model. > > > Indeed! > > > > It is not helpful that there are two ways to access content in > Fedora, datastream disseminations and DO Service Methods (aka > Operations)(formerly known as disseminators). Unfortunately, > datastream disseminators are easy to understand and use while service > methods are much harder and most applications avoid them. > > > Indeed. > > > > Fedora must provide a clear means to encapsulate the content or > Fedora will have problems evolving. Technically, datastream > disseminations don't break encapsulation but having them facilitates > thinking in terms of concrete datastreams instead of abstract stable > resources. > > > Indeed. > > > > If you make statements about the datastream itself you may step over > the line since the internal representation will change given enough > time and may break the truthfulness of statements made in global > models if internal information is exposed. > > > This is the old conflict between interfaces and implementations. > > > Agreed but it is useful to have the discussion in terms of Fedora > because Fedora is not a programming language, or a relational > database, Web server, triple-store (...) though there are conceptual > overlaps with other aspects of the other computing paradigms, and now > with semantic technologies (and the Semantic Web) the lines are even > more blurred. I hope these discussions will help us think clearly > regarding where Fedora needs to go and I think your work represents a > very important part. > > > > > Internal use of RDF to support the encapsulation is a wonderful > generic approach to support encapsulation. But statements in the > external (global) model must only be about facts which should be known > externally (globally) and for a number of reasons, particularly > preservation, should be relatively stable. > > > Indeed. > > > > > Externally, datastreams should likely not be objects and it would be > best if external models avoid statements about concrete datastreams > (and likely datastream disseminations) going forward. Statements > about the internal structure of the DO should not show up in external > models. External models should ignore the existence of datastreams > and assert statements about the DOs, services and methods---you can > get the same functionality without making statements about internal > persistence artifacts that must change over time if Fedora is to > evolve. > > > I agree in the "should" statements. External models "should" not > depend on the internal representation. I will say, however, that > currently this "should" is not possible. To much is directly linked to > the internal representation, that not depending on it is achievable. > > > I have a general problem with the Fedora API, which is perhaps > understandably, given that I am the guy with the > EnhancedContentModels. The API is object centric, not class centric. > You can get the list of methods/services on an object, but there is no > easy or even hard way to get the list of services implied by a content > model. As such, you have individual objects, decorated with services, > not classes of objects. > So, the one method that I would really like to see would, on a content > model, give me the list of services that the subscribing data objects > would have. > > > > Fedora needs to evolve. Let us (the community) debate extensions of > APIs or replacement APIs like MediaShelf's contribution of the REST > API. Fedora is moving to a more community-driven model managed with > new committers being added from outside the Duraspace/Fedora Commons > organization. Please note that I am slowly moving out of the > Duraspace organization so I think of myself as moving into the > application community. The Fedora Repo can have many APIs some which > are improved capabilities and some which are less capable than but > support widely used standards. For example, there are projects > underway to provide WebDAV and JCR interfaces. > > I agree with the sense of your comments and would like to think about > how it could be implemented. As to the approach for your ideas, I > think keeping the number of operations in the repository APIs to a > minimum is an important design principle (the classic hourglass > design). The methods you suggest could be implemented consistently on > all CModel objects conceptually like disseminators. Currently the > Fedora Repo has a limited ability easily or optimally add the > functionality associated with those methods but it can be done. > Improvements with the modularity especially if Fedora can be moved to > an OSGi platform will make adding custom functionality easier. One > big problem is that there is no implemention on the new REST API for > getDissemination which is needed for an object-centric, > service-oriented approach. If I had my way, we would eventually > deprecate the datastream disseminations and use the service > (dissemination) approach because it can provide a uniform interface > for all dissemination at the cost of a slightly longer URI/URL. > > If mapping to the Web architecture, the DO-service-method-parameters > should preferably be a stable URI which also can be often be used in > the Fedora architecture to cause the streaming of a serialized > representation. It would be desirable that external URIs refer to > some essential characteristic of the DO and its content that will > always be true even if the concrete implementation of the object > changes. > > > Indeed. > > > > > Internal models should be free to make statements about the internal > structure of the Fedora object and, I think but am not sure, can use > statements which are derived from external views of objects since such > statements are supposed to be stable. > > > Indeed. > > > > > However, there needs to be a separation of concerns between the > external (global) and internal views. Care should be taken about the > visibility of internal statements. They cannot be held to be globally > true. Internal information can be presented to the external model > through the use of URIs established through the CMA using > DO-service-method-parameters exercising care that there is an > abstraction placed between the internal model and the external model. > This enables statements made to global URIs to be globally true. > > > I am not sure I understand what you are saying here. > > > I am suggesting that we enable accessing virtually all the information > about the Fedora DOs through services (disseminations) on objects > (including specializing CModel objects). Also, it is feasible to > design services which write to Fedora DOs or cause side-effects. > There are a few places where this is not an optimal approach but I > think it should be a primary way of interacting with the Fedora Repo. > > > > There still needs to be methods that permit access to the internal > structures of the DO so that "privileged" applications can create the > concrete persisted internal content. These operations need to know > about datastreams. > > > And, now, for the first time, do I understand what caused the > separation between API-A and API-M. > > > > > > > > I am not sure we know enough to create APIs which absolutely and > clearly separate the external and internal models. I think that we > need to keep the separation of the models clear in our minds and > exercise care when we use them and in the evolution of Fedora's > design. > > > I think it might be time to rethink the API and the evolution in > Fedora based on this separation. I am not aware if there is design > guidelines in regards to the separation. > > > > I agree. There is material but it is scattered and finding the entry > point is hard. Now there is a good Wiki and tracker we can make this > information more accessible. It would be great because we need to > articulate the design patterns if we want consistent evolution of > Fedora. Fedora is very pattern driven which, as you know, is a major > theme in software development these days we can apply. I like to > think of you work as an important step toward a "Model-driven Content > Architecture". > > > > In particular, I think we need to be careful when we extract this > information into a triple-store where it is easy to combine/conflate > the two models and inadvertently mix statements whose truthfulness is > long term with transient implementation-specific information. > > > > So, we should hold on integrating RELS-INT? > > > > I am just adding my comments to the debate and posing questions. The > Fedora developers have been very careful in introducing RDF because > its technology was immature, there was insufficient open source > support libraries, and performance was low. First and foremost Fedora > needed to be a trustworthy platform for the creation of repositories > and a degree of being cautious is important. > > Things have changed regarding RDF. It is probably practical now to > describe the internal relationships/structure of a Fedora DO using RDF > both as a class (model) and for instances. Likely some of those > internal relationships need to reference external entities (what are > the rules). > > A classic example is versioning which is currently done using > hierarchical XML. But what happens if we have a DO containing > Computer Aided Design information which takes five datastreams? After > a future format migration the CAD information is contained within one > datastream (keeping the old version around because that is a policy of > your institution). RDF can easily represent that but doing it in the > current versioning schema is difficult. Lets not even start on > representing email collections. > > I will try to add more pointed comments about RELS-INT in a separate > note the review needs all of us and I was hoping to provide some > background information I felt relevent to the review. Ultimately the > committers will decide but I am glad that Steve stepped up to take > this on. RELS-EXT and RELS-INT has been a tough long term design > debate which I don't think is over yet. > > By the way, RELS_INT could be renamed though likely RELS-EXT needs to > stay the same for a while to avoid causing breakage. > > Regards > > > > > > Daniel W. Davis > http://fedora-commons.org > dda...@fedora-commons.org > (607) 255-6090 (Office) > > > > Asger Blekinge-Rasmussen wrote: > > > Hi Frank > > Thanks for the reply. > > Yes, you definitely nailed down the the missing points. > > RELS-INT and RELS-EXT are misnamed, for the very reason you wrote. No > contest there. > > About the RELS-EXT relations to datastreams in the object, that was a > hack. A fedora object has some relation, fedora-view:disseminates I > think, to each datastream belonging to this object. Since this is the > same relation to every datastream, it is not possible to define a OWL > allValuesFrom restriction. In fact it is possible, but it has the effect > of demanding that all datastreams in the object is of the specified > class. Similarly, cardinality on that relation can only specify the > total number of datastreams. > I got around this by making my own relation (in RELS-EXT) to the > datastreams in the same object, but as you point out, these relations > could go to datastreams in another object. > > Anyway, the introduction of RELS-INT does bring the current object > serialisation (foxml) into question. A datastream object conceptually > contains > A ID > Object properties (in RELS-INT) > Content (In the datastream proper) > Versioning (In the datastream proper) > Audit trail (in the AUDIT datastream in the Object) > > So a datastream object are serialised into three datastreams, it self, > RELS-INT and AUDIT. And the fedora object then gets a relation to this > object. To accomadate the new conceptual structure, it would probably > be simpler to serialise each datastream to it's own xml file, and make > links from the fedora object to each datastream it "contains". > The problem with this approach is that the traditional Fedora objects > will just become a collection of datastreams, and properties about this > collection, and not data in itself. This could easily be modelled with a > datastream object, and thus we have come full circle. Objects will in > effect reduced to having just one datastream. > This idea is starting to scare me somewhat.... > > Regards > > > > On Mon, 2009-07-06 at 12:14 +0200, Schwichtenberg, Frank wrote: > > > > Hi Asger, > I absolutely agree with you. That seems to be the logical enhancement > of Enhanced Content Models. :-) > > I just wonder if the idea of RELS-EXT and RELS-INT holds. So, you are > right pointing out datastreams are entities (or objects), now. They > have URIs and it is possible to make statements in RELS-INT with > datastreams of other Fedora objects as object(-of-the-statement). So > far your idea to enhance Enhanced Content Models, which is great I > think. > > My criticism on "the idea of RELS-EXT and RELS-INT" would be: One can > refer datastreams extern to the Fedora object the RELS-INT belongs to. > And it is possible to refer entire Fedora objects from RELS-INT. > Obviously it is possible to refer datastreams from RELS-EXT, also such > of the Fedora object the RELS-EXT belongs to. So "EXT" and "INT" seems > to be out-dated. The difference between RELS-EXT and RELS-INT has > nothing to do with relations to external or internal entities. But > with making statements about the object > (RELS-EXT) or about parts of the object (RELS-INT). So datastream URIs in > statements (both in RELS-EXT and RELS-INT) bring in possible complexity. > > I don't want to say that's bad; just thoughts. Maybe that is something > people are waiting for. And the possibility to specify datastream > cardinality (maybe min and max) is great. > > Maybe that just brings us back to the question, why not just allow > datastreams of RDF/XML content which are automatically get propagated > to the resource index. > > Regards, Frank > > > > > -----Ursprüngliche Nachricht----- > Von: Asger Blekinge-Rasmussen [mailto:a...@statsbiblioteket.dk] > Gesendet: Freitag, 3. Juli 2009 20:39 > An: fedora-commons-developers@lists.sourceforge.net > Betreff: [Fedora-commons-developers] How RELS-INT breaks the Fedora > paradigm and opens the door for new and innovative solutions to old > problems > > Hi > > Steve Bayliss have just finished adding the RELS-INT datastream to > Fedora, as announced on this list. I have been in some discussion with > him, as also shown on this list. This discussion have granted me a > chance to fully understand the conceptual change that RELS-INT brings. > > In the semantic web paradigm, everything with an URI is a thing, which > can have properties and so on. But in Fedora, so far only Objects > could have properties (relations) > > This all changed with the introduction of RELS-INT. Steve Bayliss have > made a system for, in a fedora object, specifying object properties > with a datastream id as subject. No more, no less. > > So datastreams are now objects, so to speak. They have a URI, and they > can have properties themselves. Formerly, there was the Fedora Object, > which had datastreams (blobs of data) and properties. Now there is the > Datastream, which has ONE blob of data, and properties. Fedora objects > now has a list of Datastreams, and properties for the object itself. > So we have two levels of objects. This is the way the Fedora paradigm > is broken. > > Big deal? Yes. Because if the datastreams can have relations, they can > have the hasModel/rdf:type relation. So, suddently we have a framework > for talking about the classes of datastreams. Now, like the content > models, there is the possibility to specify restrictions and demands > on the datastream, both it's relations and it's content. > > Some might remember the old problem with the DS-COMPOSITE-MODEL > datastream. There is no way to specify datastreams that might be > there, only datastreams that have to be there, and there is no way to > specify cardinality for datastreams. With the use of RELS-INT and > enhanced content models, we can now specify > something close to a solution to this problem. > Enhanced Content Models give the ability to define an ontology for > subscribing objects. This could include relations from the object to > the > objects datastreams. On such relations, Enhanced COntent Models give > the > ability to make cardinality demands, and specify the class/content > model > of range. > So, in the RELS-EXT for an object you could make this blob > > <rdf:Description rdf:about="info:fedora/demo:object1"> > <fedora-system:hasModel rdf:resource="info:fedora/demo:cm1"/> > <demo:hasDCdatastream rdf:resource="info:fedora/demo:object1/DC1"/> > <demo:hasDCdatastream rdf:resource="info:fedora/demo:object1/DC2"/> > <demo:hasDCdatastream rdf:resource="info:fedora/demo:object1/DC3"/> > </rdf:Description> > > Then in the ontology we would specify something like <owl:Class > rdf:about="info:fedora/doms:ContentModel_DOMS"> > > <rdfs:subClassOf> > <owl:Restriction> > <owl:onProperty rdf:resource="#hasDCdatastream"/> > <owl:minCardinality > rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">3</owl:minCard > i > nality> > </owl:Restriction> > </rdfs:subClassOf> > <rdfs:subClassOf> > <owl:Restriction> > <owl:onProperty rdf:resource="#hasDCdatastream"/> > <owl:allValuesFrom > > rdf:resource="info:fedora/demo:DCdatastreamcontentModel"/> > </owl:Restriction> > </rdfs:subClassOf> > </owl:Class> > This basically says that demo:object1 must have at least three > hasDCdatastream relations to things of the type > demo:DCdatastreamcontentModel > > This in the RELS-INT in demo:object1 > <rdf:Description rdf:about="info:fedora/demo:object1/DC1"> > <fedora-system:hasModel > rdf:resource="info:fedora/demo:DCdatastramcontentModel"/> > </rdf:Description> > <rdf:Description rdf:about="info:fedora/demo:object1/DC2"> > <fedora-system:hasModel > rdf:resource="info:fedora/demo:DCdatastramcontentModel"/> > </rdf:Description> > <rdf:Description rdf:about="info:fedora/demo:object1/DC3"> > <fedora-system:hasModel > rdf:resource="info:fedora/demo:DCdatastramcontentModel"/> > </rdf:Description> > > > And voila, you have specified that objects of demo:cm1 must have at > least three datastreams, which all have a specific content model. > > > I have not fully thought everything above through, but I hope you get > the gist of it. I would like to hear other peoples thoughts on this. > Think of this as a preliminary on how RELS-INT can be used in enhanced > content models > > Regards > Asger > > Enhanced content models to be found on ecm.sourceforge.net > > > > ---------------------------------------------------------------------- > - > ------- > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > ------------------------------------------------------- > > Fachinformationszentrum Karlsruhe, Gesellschaft für > wissenschaftlich-technische Information mbH. Sitz der Gesellschaft: > Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. > Geschäftsführerin: Sabine Brünger-Weilandt. > Vorsitzender des Aufsichtsrats: MinR Hermann Riehl. > > > > > > ---------------------------------------------------------------------- > -------- > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > > > ---------------------------------------------------------------------- > -------- > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/Challenge > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers