Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Sebastian Schaffert Tue, 13 Nov 2012 04:31:45 -0800

Am 13.11.2012 um 12:39 schrieb Reto Bachmann-Gmür:

> Hi Sebastian,
> 
> On Tue, Nov 13, 2012 at 11:52 AM, Sebastian Schaffert <
> [email protected]> wrote:
> 
>> Hi Reto,
>> 
>> I don't understand the use case, and I don't think it is well suited for
>> comparing different RDF APIs.
>> 
> 
> Isn't that a slight contradiction? ;)


Only a slight contradiction. I don't understand why this really is a use case 
;-)

> 
> Understanding: you have a set of contact objects, we don't care were they
> come fro or how many they are, we just have some contacts. Now we would
> like to deal with them as an RDF datasource.
> 
> Well Suitedness: An RDF application typically doesn't have the priviledge
> to have only graphs as inputs. It will have to deal with `Contact`S,
> `StockQuote`S and `WeatherForecasts`S having a wrapper on these objects
> that makes them RDF graphs is the first step to then allow processing with
> the generic RDF tools and e.g. merging with other RDF data .


I think we have two completely different concepts about RDF here. For me, it is 
purely a graph database, in a similar way a RDBMS is a relational database, and 
therefore should provide means to query in a graph way (i.e. listing edges and 
performing structural queries). So yes, an RDF application ONLY and EXCLUSIVELY 
deals with graphs.

You seem to want to treat it as an object repository, i.e. in a similar way 
Hibernate does on top of RDBMS. For me, this would mean adding an additional 
layer on top of graphs, and does not lend itself very well to evaluating the 
RDF API. 

Unfortunately, the way Java and RDF interpret objects are very different. Where 
Java assumes a fixed and pre-defined schema (i.e. a class or interface), RDF is 
a semi-structured format with no a-priori schema requirement. Where Java has 
(ordered) lists, RDF (without the reification concepts) only has unordered 
sets. Where an object of type A in Java will always be an A, in RDF the same 
object (resource) can be many things at the same time (e.g. a Concert, a 
Calendar Entry and a Location, simply different views on the same resource).

The way we solved this in KiWi and also in the LMF is through "facading", i.e. 
Java interfaces (e.g. [2]) that map getters/setters to RDF properties and are 
handled at runtime using a Java reflection invocation handler [1]. Note that 
this is a layer that is totally agnostic of the underlying RDF API, and 
complicated in any case since the Java and RDF concepts of "objects" do not go 
along very well with each other. Note that ELMO (from the Sesame people) 
implemented a very similar approach.

[1] 
http://code.google.com/p/lmf/source/browse/lmf-core/src/main/java/kiwi/core/services/facading/LMFInvocationHandler.java
[2] 
http://code.google.com/p/lmf/source/browse/lmf-core/src/main/java/kiwi/core/model/user/KiWiUser.java

Despite the solution I described, I still do not think the scenario is well 
suited for evaluating RDF APIs. You also do not use Hibernate to evaluate 
whether an RDBMS is good or not.


> 
>> 
>> If this is really an issue, I would suggest coming up with a bigger
>> collection of RDF API usage scenarios that are also relevant in practice
>> (as proven by a software project using it). Including scenarios how to deal
>> with bigger amounts of data (i.e. beyond toy examples). My scenarios
>> typically include >= 100 million triples. ;-)
>> 
>> In addition to what Andy said about wrapper APIs, I would also like to
>> emphasise the incurred memory and computation overhead of wrapper APIs. Not
>> an issue if you have only a handful of triples, but a big issue when you
>> have 100 million.
>> 
> 
> It's a common misconception to think that java sets are limited to 231-1
> elements, but even that would be more than 100 millions. In the challenge I
> didn't ask for time complexity, it would be fair to ask for that too if you
> want to analyze scenarios with such big number of triples.

It is a common misconception that just because you have a 64bit architecture 
you also have 2^64 bits of memory available. And it is a common misconception 
that in-memory data representation means you do not need to take into account 
storage structures like indexes. Even if you represent this amount of data in 
memory, you will run into the same problem.

95% of all RDF scenarios will require persistent storage. Selecting a scenario 
that does not take this into account is useless.

> 
> 
>> A possible way to bypass the wrapper issue is the approach followed by
>> JDOM for XML, which we tried to use also in LDPath: abstract away the whole
>> data model and API using Java Generics. This is typically very efficient
>> (at runtime you are working with the native types), but it is also complex
>> and ugly (you end up with a big list of methods implementing delegation as
>> in
>> http://code.google.com/p/ldpath/source/browse/ldpath-api/src/main/java/at/newmedialab/ldpath/api/backend/RDFBackend.java
>> ).
>> 
> I think this only supported accessing graphs an not creation of grah
> objects, so I'm afraid you can't take the challenge with that one.
> 

In the implementation we have done, yes (to reduce the burden on the people 
implementing backends). It is, however, easy to apply the concept also to 
creating graphs.

> 
> 
>> 
>> My favorite way would ba a common interface-based model for RDF in Java,
>> implemented by different backends. This would require the involvement of at
>> least the Jena and the Sesame people. The Sesame model already comes close
>> to it, but of course also adds some concepts that are specific to Sesame
>> (e.g. the repository concept and the way contexts/named graphs are
>> handled), as we discussed some months ago.
>> 
> 
> Yes, that was the thread:
> http://mail-archives.apache.org/mod_mbox/incubator-stanbol-dev/201208.mbox/%3CCAMmeZRmQcQP1syT=ccDG=fsxhoqa4ocavcrbkhtxritiwt3...@mail.gmail.com%3E
> 
> I think such an interface based common API is the goal, Let's compare the
> approaches we have. Le's create different usecase to see how the existing
> APIs compared, the challenge I posed is just a start.


I agree mostly, I just don't consider your use case very relevant, especially 
not as the "first challenge" for an RDF API. 

Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Reply via email to