Re: Future of Clerezza and Stanbol

Andy Seaborne Mon, 12 Nov 2012 13:41:15 -0800

On 12/11/12 19:42, Reto Bachmann-Gmür wrote:

On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <a...@apache.org> wrote:

On 09/11/12 09:56, Rupert Westenthaler wrote:

RDF libs:
====

Out of the viewpoint of Apache Stanbol one needs to ask the Question
if it makes sense to manage an own RDF API. I expect the Semantic Web
Standards to evolve quite a bit in the coming years and I do have
concern that the Clerezza RDF modules will be updated/extended to
provide implementations of those. One example of such an situation is
SPARQL 1.1 that is around for quite some time and is still not
supported by Clerezza. While I do like the small API, the flexibility
to use different TripleStores and that Clerezza comes with OSGI
support I think given the current situation we would need to discuss
all options and those do also include a switch to Apache Jena or
Sesame. Especially Sesame would be an attractive option as their RDF
Graph API [1] is very similar to what Clerezza uses. Apache Jena's
counterparts (Model [2] and Graph [3]) are considerable different and
more complex interfaces. In addition Jena will only change to
org.apache packages with the next major release so a switch before
that release would mean two incompatible API changes.


Jena isn't changing the packaging as such -- what we've discussed is
providing a package for the current API and then a new, org.apache API.
  The new API may be much the same as the existing one or it may be
different - that depends on contributions made!


I didn't know about jena planning to introduce such a common API.


I'd like to hear more about your experiences esp. with Graph API as that
is supposed to be quite simple - it's targeted at storage extensions as
well as supporting the richer Model API.  Personally, aside from the fact
that Clerreza enforces slot constraints (no literals as subjects), the Jena
Graph API and Clerezza RDF core API seem reasonably aligned.


Yes the slot constraints comes from the RDF abstract syntax. In my opinion
it's something one could decide to relax, by adding appropriate owl:sameAs
bnode any graph could be transformed to an rdf-abstract-syntax compliant
one. So maybe have a GnereicTripleCollection that can be converted to an
RDFTRipleCollection - not sure. Just sticking to the spec and wait till
this is allowed by the abstract syntax might be the easiest.


At the core, unconstrained slots has worked best for us.

Then either:

1/ have a test like:
  Triple.isValidRDF

2/ Layer an app API to impose the constraints (but it's easy to run outof good names).

The Graph/Node/Triple level in Jena is an API but it's primary role isthe other side, to storage and inference, not apps.


Generality gives
A/ Future proofing (not perfect)
B/ Arises in inference and query naturally.
C/ using RDF structures for processing RDF

Nodes in triples can be variables, and I would have found it useful tohave marker nodes to be able to build structures e.g. "known to be boundat this point in a query". As it was, I ended up creating parallelstructures.

Where I see advantages of the clerezza API:
- Bases on collections framework so standard tools can be used for graphs

Given a core system API, a scala and clojure and even different JavaAPIs for difefrent styles are all possible.

A universal API across systems is about plugging in machinery (parser,query engines, storage, inference). It's good to separate that fromapplication APIs otherwise there is a design tension.

- Immutable graphs follow identity criterion of RDF semantics, this allows
graph component to be added to sets and more straight forwardly implement
diff and patch algorithms
- BNode have no ids: apart from promoting the usage of URIs where this is
appropriate it allows behind the scenes leanification and saves memory
where the backend doesn't hast such ids.


We have argued about this before.

+ As you have objects, there is a concept of identity (you can tell twobNodes apart).+ For persistence, an internal id is necessary to reconstructconsistently with caches.+ Leaning isn't a core feature of RDF. In fact, IIRC, mention is goingto be removed. It's information reduction, not data reduction.+ There will be a have a skolemization Note from RDF-WG to deal with thepractical matters of dealing with bNodes.


RDF as data model for linked data.

Its a datastructure with good properties for combining.  And it has links.


(for generalised systems such as rules engine - and for SPARQL - triples
can arise with extras like literals as subjects; they get removed later)



If this shall be an API for interoperability based on RDF standard I'm
wonder if is shall be possible to expose such intermediate constructs.

My suggestion is that the API for interoperability is designed tosupport RDF standards.


The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.

But also storage, SPARQL (Query and Update), and web access (e.g. conneg).

(and inference but it seems to me that inference have adopted more"individual" (data object), not triplem, styles)


Reto


        Andy

Re: Future of Clerezza and Stanbol

Reply via email to