Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Steve Bayliss Mon, 09 Nov 2009 04:40:15 -0800

Hi Carsten

Thanks for your feedback.


1) 2 APIs - my original post was badly worded here.  I agree that there
should not be a split of functionality, methods implemented in REST should
also be available in the SOAP API.  I think the distinction is
i) SOAP API has an HTTP endpoint, resource identifiers are not part of the
URL and are passed in (along with the method) through SOAP
ii) REST API has HTTP endpoints, these can be considered as resolvable
resource identifiers, and can therefore be considered nodes in an RDF
graph(s) expression of the of the repository.

2) Mulgara - worth investigating how other triple stores treat multiple
graphs, querying across them, and potential for defining "views" (or other
mechanisms) that facilitate querying across multipe graphs

3) Performance - this does need some investigation to see if this would be
an issue.  Hopefully by the appropriate definition of "views", there should
not be any pain in query composition (in the most case queries could run
against the top-level #ri view).

Steve

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: 08 November 2009 23:33
To: [email protected];
[email protected]
Subject: RE: [Fedora-commons-developers] The REST API, The Resource Index
and the Semantic Web


I think Steve post is a very important contribution as it combines and
focuses a lot of recent discussion and shows a very promising way forward.
Provenance of triples is a big issue for us here and I'm particularly happy
to see a potential solutions for this.

A few comments:

* About 2 APIs: I think it is a good idea to supply both a REST and a SOAP
API, but I would not split functionality between the two. I think it is
important to offer full functionality in both. I think people's choice of
API transport layer is mainly driven by how they write their client
(especially whether they use something like JavaScript where REST/JSON is
much easier or something like Java / C# where SOAP is easier and offers some
other advantages) rather than by what functionality they want to access. 

* While I understand the Mulgara focus in the discussion as it is the triple
store of choice for Fedora, I think it is important to keep an eye out on
where the implementation is in principal portable to other triple stores and
where it uses Mulgara specific features that do not exist in other stores.

* In the same context I think it is important to keep an eye out on how
changes to the triple store storage model impact query times and query
composition (e.g. it might be painful if one had to explicitly name all
involved graphs in a query).

Carsten Friedrich
Research Team Leader
CSIRO ICT Centre
P: +61 (0) 2 6216 7019 
www.ict.csiro.au

> -----Original Message-----
> From: Steve Bayliss [mailto:[email protected]]
> Sent: Saturday, 7 November 2009 08:36
> To: [email protected]
> Subject: [Fedora-commons-developers] The REST API, The Resource Index
> and the Semantic Web
> 
> Thinking over the current debates over the REST API, particularly
> manipulating relationships, and how the resource index fits in with
> this, I
> wonder if there is some unified approach that could be used to relate
> all of
> these together in a semantic web-friendly, REST-friendly, Web 2.0-
> friendly
> model.
> 
> Asger's work on Enhanced Content Models, and particularly the ideas
> around a
> "reference counting" mechanism for triples to get around some of the
> limitations with the current single-graph resource index, plus our own
> work
> on having arbitrary RDF datastreams propagated to the resource index
> (and
> the inherent problems with this) also feeds into this thinking, along
> with
> Carsten Friedrich's recent post expressing a desire for a relationships
> API
> that is not tied to needing to manipulate individual RELS-EXT, RELS-INT
> and
> DC datastreams.  Ben Armintor's comments on the wiki on a (sub-)
> graph-centric approach to manipulating relationships is also relevant.
> 
> This is early-stage thinking, but I thought it might be useful to get
> these
> ideas out there, albeit in a bit of a raw state.  And spending too long
> trying to define a vision of where you want to get to can get in the
> way of
> actually getting there...
> 
> And what follows is pretty dependent on Fedora's Resource Index being
> enabled, it is also Mulgara-centric, which is not exactly in line with
> current thinking.  So completely ignoring the
> "triplestore-is-only-a-cache-and-might-not-even-exist" issue...
> 
> So:
> 
> Fundamentally two "kinds" of APIs:
> 
> 1) an API much as the current SOAP API, with a Fedora-object-centric
> view of
> the world, for manipulating objects, datastreams, disseminators etc
> 
> 2) a "semweb" API, with an RDF graph expression(s) of the Fedora
> repository,
> where resource URIs in the graph (objects, datastreams, disseminators
> etc)
> are resolvable, and are REST endpoints both for disseminating the
> contents
> of the repository (bitstreams, resource metadata, RDF graphs describing
> resources etc), and making changes to the repository, using REST
> semantics.
> So you could navigate the resource index to discover resources, then
> use the
> resource identifiers as REST endpoints.
> 
> So essentially the "semweb" API would represent a coming-together of
> the
> REST API and the resource index.  I think Asger's current proposal for
> an
> alternative REST API would fit in very well with this in terms of
> exposing
> the kind of REST endpoints that would be needed - and would provide the
> resolvable resource URIs for the RDF representation(s).
> 
> The Resource Index and graphs (models)
> ======================================
> Currently the Fedora Resource Index is a single graph, <#ri> (or
> <rmi://someserver/fedora#ri>).
> 
> Mulgara supports creation of multiple models (or graphs) and querying
> across
> multiple graphs.  (Fedora does make use of additional graphs, a
> datatyping
> graph, and a full text model if full text indexing is enabled).
> 
> Mulgara also supports creation of "View" models which do not hold
> triples,
> but are a view over multiple models, for instance the union of several
> graphs: http://docs.mulgara.org/itqloperations/views.html
> 
> It should therefore be possible to express a Fedora repository as a set
> of
> individual graphs whilst still presenting an overall single graph view
> of
> the repository; with sub-graphs being individually identifiable.
> 
> Essentially some kind of hierarchy of graphs and views, for example
> (please
> ignore the actual model/graph identifiers used below, I've not thought
> those
> through... this is just for conceptual illustration!).  (and note that
> these
> are not Fedora resource identifiers - they are identifiers for graphs
> and
> sub-graphs describing Fedora resources, with triples containing URIs
> that
> resolve to Fedora resources.)
> 
> <#ri> - a view containing:
>   <#some:pid> - object graph for some:pid, a view containing:
>     <#some:pid/properties> - graph containing object properties
>     <#some:pid/datastreams> - a view containing:
>       <#some:pid/datastreams/rels-ext> - graph containing triples from
> rels-ext
>       <#some:pid/datastreams/rels-int> - graph containing triples from
> rels-int
>       <#some:pid/datastreams/dc> - graph containing triples from DC
>       <#some:pid/datastreams/{rdf datastream}> - graph containing
> triples
> from some other rdf datastream
>       <#some:pid/datastreams/{dsid}/properties> - graph containing
> properties of datastream {dsid} (state, last modified, etc)
>   <#some:otherpid> - object graph for some:otherpid, a view containing:
>     <#some:otherpid/properties> - etc
>     <#some:otherpid/datastreams> - etc
>       ...
> 
> There's undoubtedly stuff I haven't thought about that should be
> included
> above (notably disseminators).  And there's probably a better design of
> this
> hierarchy.  But as a principle...
> 
> The top-level <#ri> graph would still look like it does today.
> 
> This top level view could be (disseminated from) a "special" Fedora
> object
> representing the repository itself (an idea I know has been floating
> around).
> 
> This could get around the situation where if one allowed arbitrary RDF
> datastreams to be propagated to the resource index, and two datastreams
> assert the same triple, deletion of one of the datastreams results in
> deletion of the triple in the resource index although the triple is
> still
> being asserted by the second datastream.
> 
> In the above example, if a triple was asserted by two different
> datastreams
> then the triple would be present in two different graphs (one graph for
> each
> datastream).  The top level <#ri> view would show a single triple,
> however
> deletion of the triple from one rdf datastream would result in it being
> removed from one graph whilst still leaving it in the graph for the
> other
> datastream, and therefore it would still be asserted in the resource
> index.
> 
> Resolvable RI URIs - being more Semantic Web- and Web 2.0-friendly
> ==================================================================
> The resource index uses the "fedora" namespace in the info uri scheme
> to
> identify objects, datastreams, disseminators etc, eg
> <info:fedora/some:pid>.
> 
> It could also be useful to also expose resolvable URIs in the resource
> index, as an alternative view.  For instance, something akin to a
> URL-rewriting mechanism could be used to transform
> <info:fedora/some:pid>
> into http://server:port/fedora/objects/some:pid (using the proposed
> alternative REST API syntax).
> 
> On the way in, queries (updates, etc) would have resolvable http
> identifiers
> translated back to the info:fedora scheme.  (So RELS-EXT, RELS-INT etc
> would
> continue to use the info:fedora scheme.)
> 
> Essentially this would be an "external" view of the resource index
> containing resolvable URIs for Fedora resources that are also REST
> endpoints.
> 
> It should also be possible to disseminate sub-graphs with resolvable
> URIs as
> (for example) OAI-ORE resource maps.
> 
> Mapping between Fedora objects and the resource index
> =====================================================
> Currently the specification of what triples get created for Fedora
> objects,
> datastreams and properties is embodied in imperative Java code.
> 
> It could be possible to move this to a declarative specification,
> perhaps as
> part of the CMA.
> 
> For instance the base content model that every object belongs to could
> specify:
> - an XSLT for generating the "system" triples for Fedora object and
> datastream properties, relationships between objects, datastreams and
> disseminators; and which graph the triples should be added to
> - an XSLT for generating triples from RELS-EXT; and which graph the
> triples
> should be added to
> - an XSLT for generating triples from RELS-INT; and which graph the
> triples
> should be added to
> 
> "User" content models could for instance specify that XML metadata
> datastream xyz should be converted using an XSLT into RDF, and the
> content
> model would also indicate what graph the triples should be created in.
> 
> (XSLT is just used as an example, there may be better/alternative
> approaches, such as GRDDL, and a combination of methods may be best)
> 
> Validation criteria (rdf schema, ontology, xml schema etc) could also
> be
> defined in a similar manner.
> 
> Unified relationships API
> =========================
> Having declarative specifications of the relationship between graphs in
> the
> resource index and the Fedora object model would help in implementing a
> unified relatinoships API - ie a method of specifying modifications to
> triples at the repository level, with the API resolving this to what it
> represents in terms of Fedora objects/datastreams and performing the
> necessary modifications on these.
> 
> Persistence is fundamental - all relationships should be stored in the
> filesystem - adding triples to Mulgara without persisting them in the
> Fedora
> object model should not be allowed.
> 
> This needs thinking about more, for instance if an arbitrary triple is
> to be
> added, what object should it be stored in (that is a triple that does
> not
> make an assertion about a Fedora object or datastream for example)?
> Should
> it be possible to add a triple(s) that assert a new datastream or
> Fedora
> object?  (ie having a completely RDF-centric API).
> 
> 
> 
> Regards
> Steve
> 
> 
> -----------------------------------------------------------------------
> -------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day
> trial. Simplify your report design, integration and deployment - and
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Fedora-commons-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.425 / Virus Database: 270.14.55/2490 - Release Date:
> 11/08/09 19:39:00


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] The REST API, The Resource Index and the Semantic Web

Reply via email to