Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Sebastian Schaffert Tue, 13 Nov 2012 02:52:54 -0800

Hi Reto,

I don't understand the use case, and I don't think it is well suited for 
comparing different RDF APIs.


If this is really an issue, I would suggest coming up with a bigger collection 
of RDF API usage scenarios that are also relevant in practice (as proven by a 
software project using it). Including scenarios how to deal with bigger amounts 
of data (i.e. beyond toy examples). My scenarios typically include >= 100 
million triples. ;-)

In addition to what Andy said about wrapper APIs, I would also like to 
emphasise the incurred memory and computation overhead of wrapper APIs. Not an 
issue if you have only a handful of triples, but a big issue when you have 100 
million.

A possible way to bypass the wrapper issue is the approach followed by JDOM for 
XML, which we tried to use also in LDPath: abstract away the whole data model 
and API using Java Generics. This is typically very efficient (at runtime you 
are working with the native types), but it is also complex and ugly (you end up 
with a big list of methods implementing delegation as in 
http://code.google.com/p/ldpath/source/browse/ldpath-api/src/main/java/at/newmedialab/ldpath/api/backend/RDFBackend.java).

My favorite way would ba a common interface-based model for RDF in Java, 
implemented by different backends. This would require the involvement of at 
least the Jena and the Sesame people. The Sesame model already comes close to 
it, but of course also adds some concepts that are specific to Sesame (e.g. the 
repository concept and the way contexts/named graphs are handled), as we 
discussed some months ago.

Greetings,

Sebastian

Am 12.11.2012 um 20:45 schrieb Reto Bachmann-Gmür:

> May I suggest the following toy-usecase for comparing different API
> proposals (we know all API can be used for triple stores, so it seems
> interesting how the can be used to expose any data as RDF and the Space
> complexity of such an adapter):
> 
> Given
> 
> interface Person() {
> String getGivenName();
> String getLastName();
> /**
> * @return true if other is an instance of Person with the same GivenName
> and LastName, false otherwise
> */
> boolean equals(Object other);
> }
> 
> Provide a method
> 
> Graph getAsGraph(Set<Person> pesons);
> 
> where `Graph` is the API of an RDF Graph that can change over time. The
> returned `Graph`shall (if possible) be backed by the Set passed as argument
> and thus reflect future changes to that set. The Graph shall support all
> read operation but no addition or removal of triples. It's ok is some
> iteration over the graph result in a ConcurrentModficationException if the
> set changes during iteration (as one would get when iterating over the set
> during such a modification).
> 
> - How does the code look like?
> - Is it backed by the Set and does the result Graph reflects changes to the
> set?
> - What's the space complexity?
> 
> Challenge accepted?
> 
> Reto
> 
> On Mon, Nov 12, 2012 at 6:11 PM, Andy Seaborne <[email protected]> wrote:
> 
>> On 11/11/12 23:22, Rupert Westenthaler wrote:
>> 
>>> Hi all ,
>>> 
>>> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <[email protected]>
>>> wrote:
>>> 
>>>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
>>>> implementation of rdf related APIs, usable with and without OSGi
>>>> 
>>> 
>>> For me this immediately raises the question: Why should the Clerezza
>>> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
>>> based on Jena and Sesame? Creating an Apache commons project based on
>>> an RDF API that is only used by a very low percentage of all Java RDF
>>> applications is not feasible. Generally I see not much room for a
>>> commons RDF project as long as there is not a commonly agreed RDF API
>>> for Java.
>>> 
>> 
>> Very good point.
>> 
>> There is a finite and bounded supply of energy of people to work on such a
>> thing and to make it work for the communities that use it.   For all of us,
>> work on A means less work on B.
>> 
>> 
>> An "RDF API" for applications needs to be more than RDF. A SPARQL engine
>> is not simply abstracted from the storage by some "list(s,p,o)" API call.
>> It will die at scale, where scale here includes in-memory usage.
>> 
>> My personal opinion is that wrapper APIs are not the way to go - they end
>> up as a new API in themselves and the fact they are backed by different
>> systems is really an implementation detail.  They end up having design
>> opinions and gradually require more and more maintenace as the add more and
>> more.
>> 
>> API bridges are better (mapping one API to another - we are really talking
>> about a small number of APIs, not 10s) as they expose the advantages of
>> each system.
>> 
>> The ideal is a set of interfaces systems can agree on.  I'm going to be
>> contributing to the interfacization of the Graph API in Jena - if you have
>> thoughts, send email to a list.
>> 
>>        Andy
>> 
>> PS See the work being done by Stephen Allen on coarse grained APIs:
>> 
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201206.**
>> mbox/%3CCAPTxtVOMMWxfk2%**2B4ciCExUBZyxsDKvuO0QshXF8uKha**
>> D8txXjA%40mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201206.mbox/%3CCAPTxtVOMMWxfk2%2B4ciCExUBZyxsDKvuO0QshXF8uKhaD8txXjA%40mail.gmail.com%3E>
>> 
>> 
>> 

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Reply via email to