Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Matt Zumwalt Sun, 08 Nov 2009 16:19:26 -0800

+1 in praise of this idea

On Nov 7, 2009, at 4:48 AM, Asger Askov Blekinge wrote:


> I very much like what you are thinking here.
>
>
> On Fri, 2009-11-06 at 22:36 +0100, Steve Bayliss wrote:
>> Thinking over the current debates over the REST API, particularly
>> manipulating relationships, and how the resource index fits in with  
>> this, I
>> wonder if there is some unified approach that could be used to  
>> relate all of
>> these together in a semantic web-friendly, REST-friendly, Web 2.0- 
>> friendly
>> model.
>>
>> Asger's work on Enhanced Content Models, and particularly the ideas  
>> around a
>> "reference counting" mechanism for triples to get around some of the
>> limitations with the current single-graph resource index, plus our  
>> own work
>> on having arbitrary RDF datastreams propagated to the resource  
>> index (and
>> the inherent problems with this) also feeds into this thinking,  
>> along with
>> Carsten Friedrich's recent post expressing a desire for a  
>> relationships API
>> that is not tied to needing to manipulate individual RELS-EXT, RELS- 
>> INT and
>> DC datastreams.  Ben Armintor's comments on the wiki on a (sub-)
>> graph-centric approach to manipulating relationships is also  
>> relevant.
>>
>> This is early-stage thinking, but I thought it might be useful to  
>> get these
>> ideas out there, albeit in a bit of a raw state.  And spending too  
>> long
>> trying to define a vision of where you want to get to can get in  
>> the way of
>> actually getting there...
>>
>> And what follows is pretty dependent on Fedora's Resource Index being
>> enabled, it is also Mulgara-centric, which is not exactly in line  
>> with
>> current thinking.  So completely ignoring the
>> "triplestore-is-only-a-cache-and-might-not-even-exist" issue...
>>
>
> As far as I can see, you actually assume that the triple store is  
> only a
> cache, but you do require that it exist. "Triple store is only a  
> cache"
> means somewhat the same as "every triple in the triplestore should be
> expressed in one of the objects"
>
>
>
>> So:
>>
>> Fundamentally two "kinds" of APIs:
>>
>> 1) an API much as the current SOAP API, with a Fedora-object- 
>> centric view of
>> the world, for manipulating objects, datastreams, disseminators etc
>>
>> 2) a "semweb" API, with an RDF graph expression(s) of the Fedora  
>> repository,
>> where resource URIs in the graph (objects, datastreams,  
>> disseminators etc)
>> are resolvable, and are REST endpoints both for disseminating the  
>> contents
>> of the repository (bitstreams, resource metadata, RDF graphs  
>> describing
>> resources etc), and making changes to the repository, using REST  
>> semantics.
>> So you could navigate the resource index to discover resources,  
>> then use the
>> resource identifiers as REST endpoints.
>>
>> So essentially the "semweb" API would represent a coming-together  
>> of the
>> REST API and the resource index.  I think Asger's current proposal  
>> for an
>> alternative REST API would fit in very well with this in terms of  
>> exposing
>> the kind of REST endpoints that would be needed - and would provide  
>> the
>> resolvable resource URIs for the RDF representation(s).
>>
>> The Resource Index and graphs (models)
>> ======================================
>> Currently the Fedora Resource Index is a single graph, <#ri> (or
>> <rmi://someserver/fedora#ri>).
>>
>> Mulgara supports creation of multiple models (or graphs) and  
>> querying across
>> multiple graphs.  (Fedora does make use of additional graphs, a  
>> datatyping
>> graph, and a full text model if full text indexing is enabled).
>>
>> Mulgara also supports creation of "View" models which do not hold  
>> triples,
>> but are a view over multiple models, for instance the union of  
>> several
>> graphs: http://docs.mulgara.org/itqloperations/views.html
>>
>> It should therefore be possible to express a Fedora repository as a  
>> set of
>> individual graphs whilst still presenting an overall single graph  
>> view of
>> the repository; with sub-graphs being individually identifiable.
>>
>> Essentially some kind of hierarchy of graphs and views, for example  
>> (please
>> ignore the actual model/graph identifiers used below, I've not  
>> thought those
>> through... this is just for conceptual illustration!).  (and note  
>> that these
>> are not Fedora resource identifiers - they are identifiers for  
>> graphs and
>> sub-graphs describing Fedora resources, with triples containing  
>> URIs that
>> resolve to Fedora resources.)
>>
>> <#ri> - a view containing:
>>  <#some:pid> - object graph for some:pid, a view containing:
>>    <#some:pid/properties> - graph containing object properties
>>    <#some:pid/datastreams> - a view containing:
>>      <#some:pid/datastreams/rels-ext> - graph containing triples from
>> rels-ext
>>      <#some:pid/datastreams/rels-int> - graph containing triples from
>> rels-int
>>      <#some:pid/datastreams/dc> - graph containing triples from DC
>>      <#some:pid/datastreams/{rdf datastream}> - graph containing  
>> triples
>> from some other rdf datastream
>>      <#some:pid/datastreams/{dsid}/properties> - graph containing
>> properties of datastream {dsid} (state, last modified, etc)
>>  <#some:otherpid> - object graph for some:otherpid, a view  
>> containing:
>>    <#some:otherpid/properties> - etc
>>    <#some:otherpid/datastreams> - etc
>>      ...
>>
>> There's undoubtedly stuff I haven't thought about that should be  
>> included
>> above (notably disseminators).  And there's probably a better  
>> design of this
>> hierarchy.  But as a principle...
>>
>> The top-level <#ri> graph would still look like it does today.
>>
>> This top level view could be (disseminated from) a "special" Fedora  
>> object
>> representing the repository itself (an idea I know has been floating
>> around).
>>
>> This could get around the situation where if one allowed arbitrary  
>> RDF
>> datastreams to be propagated to the resource index, and two  
>> datastreams
>> assert the same triple, deletion of one of the datastreams results in
>> deletion of the triple in the resource index although the triple is  
>> still
>> being asserted by the second datastream.
>>
>> In the above example, if a triple was asserted by two different  
>> datastreams
>> then the triple would be present in two different graphs (one graph  
>> for each
>> datastream).  The top level <#ri> view would show a single triple,  
>> however
>> deletion of the triple from one rdf datastream would result in it  
>> being
>> removed from one graph whilst still leaving it in the graph for the  
>> other
>> datastream, and therefore it would still be asserted in the  
>> resource index.
>
> And you have thus beautifully solved an old Fedora problem!
>
>>
>> Resolvable RI URIs - being more Semantic Web- and Web 2.0-friendly
>> ==================================================================
>> The resource index uses the "fedora" namespace in the info uri  
>> scheme to
>> identify objects, datastreams, disseminators etc, eg <info:fedora/ 
>> some:pid>.
>>
>> It could also be useful to also expose resolvable URIs in the  
>> resource
>> index, as an alternative view.  For instance, something akin to a
>> URL-rewriting mechanism could be used to transform <info:fedora/ 
>> some:pid>
>> into http://server:port/fedora/objects/some:pid (using the proposed
>> alternative REST API syntax).
>>
>> On the way in, queries (updates, etc) would have resolvable http  
>> identifiers
>> translated back to the info:fedora scheme.  (So RELS-EXT, RELS-INT  
>> etc would
>> continue to use the info:fedora scheme.)
>>
>> Essentially this would be an "external" view of the resource index
>> containing resolvable URIs for Fedora resources that are also REST
>> endpoints.
>>
>> It should also be possible to disseminate sub-graphs with  
>> resolvable URIs as
>> (for example) OAI-ORE resource maps.
>>
>> Mapping between Fedora objects and the resource index
>> =====================================================
>> Currently the specification of what triples get created for Fedora  
>> objects,
>> datastreams and properties is embodied in imperative Java code.
>>
>> It could be possible to move this to a declarative specification,  
>> perhaps as
>> part of the CMA.
>>
>> For instance the base content model that every object belongs to  
>> could
>> specify:
>> - an XSLT for generating the "system" triples for Fedora object and
>> datastream properties, relationships between objects, datastreams and
>> disseminators; and which graph the triples should be added to
>> - an XSLT for generating triples from RELS-EXT; and which graph the  
>> triples
>> should be added to
>> - an XSLT for generating triples from RELS-INT; and which graph the  
>> triples
>> should be added to
>>
>> "User" content models could for instance specify that XML metadata
>> datastream xyz should be converted using an XSLT into RDF, and the  
>> content
>> model would also indicate what graph the triples should be created  
>> in.
>>
>> (XSLT is just used as an example, there may be better/alternative
>> approaches, such as GRDDL, and a combination of methods may be best)
> I was actually thinking that this could be expressed as disseminators.
> Then the content model would only have to express which disseminator  
> to
> call.
>
>
>>
>> Validation criteria (rdf schema, ontology, xml schema etc) could  
>> also be
>> defined in a similar manner.
>>
>> Unified relationships API
>> =========================
>> Having declarative specifications of the relationship between  
>> graphs in the
>> resource index and the Fedora object model would help in  
>> implementing a
>> unified relatinoships API - ie a method of specifying modifications  
>> to
>> triples at the repository level, with the API resolving this to  
>> what it
>> represents in terms of Fedora objects/datastreams and performing the
>> necessary modifications on these.
>>
>> Persistence is fundamental - all relationships should be stored in  
>> the
>> filesystem - adding triples to Mulgara without persisting them in  
>> the Fedora
>> object model should not be allowed.
> And thus the triple store IS only a cache ;) But this is required.
>
>
>>
>> This needs thinking about more, for instance if an arbitrary triple  
>> is to be
>> added, what object should it be stored in (that is a triple that  
>> does not
>> make an assertion about a Fedora object or datastream for  
>> example)?  Should
>> it be possible to add a triple(s) that assert a new datastream or  
>> Fedora
>> object?  (ie having a completely RDF-centric API).
> I feel a useful distinction could be statements to create new graphs,
> and statements to add triples to a graph. Graphs should only be  
> created
> through the "traditional" API, as they create new objects and
> datastreams.
>
> About modifying the content of say the DC datastream through the  
> triple
> store, for that to work we need a way to map rdf statements back into
> dublin core xml. This could be done by having two XSLTs and marking
> those graphs that cannot map back as writeprotected.
>
> Regards
>
>
>>
>>
>>
>> Regards
>> Steve
>>
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports  
>> 2008 30-Day
>> trial. Simplify your report design, integration and deployment -  
>> and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> Fedora-commons-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons- 
>> developers
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Fedora-commons-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Reply via email to