Re: Clerezza, Stanbol, Jena, Semantic Commons, WDYT?

Andy Seaborne Fri, 12 Nov 2010 07:47:24 -0800

There are various ways to approach the problem: given there is alreadyexisting code and data, my take is


   https://github.com/afs/JenaSesame

showing Jena API over Sesame, so you can plug a existing Jenaapplication into Sesame storage with very few changes (factory calls toconnect to the repository). But the real standard for efficientprocessing is SPARQL because the granularity works better.

JenaSesame provides efficiency by, for example, passing query processingdown and not executing over a narrow interface (if you want SPARQL 1.1over Sesame now, the narrow interface version will work as well).

In even in closed environments, SPARQL 1.1 helps. It's possible to writea 3-tier application using a SPARQL-DB as the database layer and stickto standards between the business logic and daatbase layers, giving youa choice of systems.


[[ Steve Harris, from [1]:
Five (boring) reasons why SW technology is good for companies

Strong Standards Interoperability comparatively good
        Less vendor lock-in
SPARQL Protocol HTTP based
        Fits well in SoA
Schemaless Data MI / BI
        Flexibility
Scalability Billions of triples with open source
        software, on basic hardware
I18N UTF-8
        Language tags
]]

        Andy


[1]
http://axel.deri.ie/presentations/20101111LightningTalksISWC2010.pdf

On 12/11/10 12:43, Henry Story wrote:


On 11 Nov 2010, at 19:01, Andy Seaborne wrote:

Bertrand suggested that this conversation happened clerezza-dev list: main 
extracts of messages in chronological order:

Hi Andy,

Quite a few years ago I suggested that the RDF APIs should be put
through
the Java Community Process in order to standardise the interfaces. At the time
people told me it was too early to do so. It may be time to revisit this now.
Reto's
work in Clerezza may be a good place to start from. I have not really looked
into
the details of his RDF abstraction, but it seems to work. Of course if
Sesame/Jena/
Mulgara all agreed on one namespace set of interfaces then one could remove one
layer of abstraction, making things more efficient presumably.

All I can say is that it is nice to be able to switch between Jena, Sesame
and
Mulgara to see what advantage each has. Doing a comparison between these
different
APIs is a lot of work. I suppose I'll have time to go into the details as I
start
using Clerezza more.

Currently I am working on implementing WebID ( http://webid.info/spec )
in
Clerezza. It is getting to be more and more timely to do so, with tools such as
FireSheep hitting the headlines, and movements such as SSL everywhere catching
on [1].
DNSSEC is also changing the whole space here [2], as it will make it very easy
to deploy SSL based servers.

I really like the way Clerezza is fully RDF based CMS (I know it's more than
a CMS,
but that is an easy way to explain what it does), and it is going to make
developing
what people term a Personal Data store - others a Social Web CMS - very easy.

By the way we want to start a WebID XG at the W3C. If anyone is interested
please let
me know. We allrady have 4 members who put their name down.

http://esw.w3.org/Foaf%2Bssl/WebIdWorkingGroup

All the best,

Henry

[1]
http://esw.w3.org/Foaf%2Bssl/FAQ#Is_SSL_not_really_expensive_server_side_to_Process.3F_To_expensive_for__Google_.3F
[2]
http://www.freedom-to-tinker.com/blog/sjs/major-internet-milestone-dnssec-and-ssl


        Andy


On 08/11/10 21:39, Reto Bachmann-Gmuer wrote:

Hi Jeremy

One of Clerezza aims was to use an RDF api that is maximally close to RDF
abstract syntax and semantics, on this RDF core api we have different
façades and utilities as well as a frontend adapter implementing the jena
API. Related standards like SPARQL and the various serialization formats are
supported as well, respective engines can be added at runtime (when running
in a OSGI container). We decided to design our own API as we found the
various API available (jena, openrdf, rdf2go) would neither be as modular
nor as close to the spec as we wanted them to be. The API comes with the
typical utilities like a command line tool and a maven plugin for the
transformation of vocabularies into classes

Apart from core part tightly coupled to RDF and related specs Clerezza also
provides a framework for implementing rest applications (JAX-RS). The
encourages design pattern is that requests are answered in terms of RDF
(i.e. a graph and typically a selected resource within this graph), clerezza
takes care about content-negotiation and for RDF formats the serializer
registered for that media type is used. For non RDF formats a template
(typically a Scala Server Pages) is selected and takes care of the
rendering.

I described this parts of Clerezza because they seem to be quite close to
what you suggest for commons. As it is hard to share utilities without
having shared APIs for the core stuff our code deals with I think some
efforts in this area could have the greatest benefit.

If you have some time, I would like to encourage feedback on the respective
APIs as currently used in Clerezza

- The core API for (mutable) graphs in:
http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/index.html
- Utilities (including resource-centric API):
http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.utils/apidocs/index.html

These two layers are similar to the Graph/Model separation in Jena.

Cheers,
Reto


On 10/11/10 23:28, Jeremy Carroll wrote:

- The core API for (mutable) graphs in:
http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/index.html

http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/org/apache/clerezza/rdf/core/TripleCollection.html#filter(org.apache.clerezza.rdf.core.NonLiteral,%20org.apache.clerezza.rdf.core.UriRef,%20org.apache.clerezza.rdf.core.Resource)


Iterator<Triple>  filter(NonLiteral subject, UriRef predicate, Resource
object)

vs

http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/graph/Graph.html#find(com.hp.hpl.jena.graph.Node,%20com.hp.hpl.jena.graph.Node,%20com.hp.hpl.jena.graph.Node)


ExtendedIterator<Triple>  find(Node s, Node p, Node o)

seems to be the fundamental choice.

The latter was the choice Chris Dollin and I made in 2002/2003 and I
still find it preferable, for program uniformity, to the closer to the
spec choice in Clerezza.
We were writing the spec at the same time, and I always saw it as a
description of a Web exchange format, and not of a programming interface
(for instance implementing RDF Semantics Rec is hard with the Clerezza
interface).

I am not quite sure what that means in terms of this discussion which is
more procedural than technical.
Like in all things people make different choices and have different
preferences, and a decision to all use the same libraries would be a
restriction in design freedom, on such issues, which might be good, or
might be bad.

===

On
http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201011.mbox/%[email protected]%3e

[[

- graph isomorphism code
]]

what are the goals of the Clerezza isomorphism code? The Jena code is
essentially scoped to testing, so that I checked that small pathological
cases were OK, and larger non-pathological cases, but it is not meant to
have production level performance, particular on graphs for which
something like nauty would be more appropriate.


On 11/11/10 10:21, Andy Seaborne wrote:
...

Isn't the model interface operation a more appropriate comparision
because that is what the application sees?

StmtIterator listStatements(Resource s, Property p, RDFNode o)

Graph.find is the SPI interface to storage. The Graph level has named
variables, not just RDF terms.  SPARQL uses this, heavily.

In SPARQL, literals can occur in any position during query processing.
Patterns involving literals as subjects, or as predicates, just simply
don't match the data (section 12.3.1).

Once upon a time, when we were going Jena1->Jena2, the idea was that the
application API was just one presentation.  There could be other RDF
APIs over the SPI.  There's not been a second RDF presentation API but
the design concept was there and still is.  All the interfaces in the
API are mainly implemented only once, and I'm not aware of any users
which use the extensibility within the Resource API anymore
(Parliament/BBN used to - I think they now use an associated
datastructure to map to internal information for any API
resources/literals from their storage).  The Resource-level API
implementation could be simplified if theer is only one implementation
of that presentation.  There is generality in Jena that we thought was a
good idea at the time but looking at way the world has gone since, not
all of it is used or useful nowadays.  Better use of factory/interface
at the SPI would be more helpful. The experimental Jena3 core also has
extension nodes and graph nodes with an eye to future possible needs
from the standards world.


Social Web Architect
http://bblfish.net/

Re: Clerezza, Stanbol, Jena, Semantic Commons, WDYT?

Reply via email to