Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Reto Bachmann-Gmür Tue, 13 Nov 2012 05:50:54 -0800

On Tue, Nov 13, 2012 at 1:31 PM, Sebastian Schaffert <
[email protected]> wrote:
[...]


>
> Despite the solution I described, I still do not think the scenario is
> well suited for evaluating RDF APIs. You also do not use Hibernate to
> evaluate whether an RDBMS is good or not.
>
The usecase I propose and I don't think this is the only one, I just think
that API comparison should be based on evaluating their suitability for
different concretely defined usecases. It has nothing to do with
hibernation neither with annotation based object to rdf property mapping
(as there have been several proposals). Its the same principle of any23 or
aperture but not on the binary data level but on the java object level. I
have my instrastructure that deals with graphs I have the a Set of contacts
how does the missing bit look like to process this set with my rdf
infrastructure. Its a reality that people don't (yet) have all their data
as graphs, they might have some contacts in LDAP and some mails on an Imap
server.


> >>
> >> If this is really an issue, I would suggest coming up with a bigger
> >> collection of RDF API usage scenarios that are also relevant in practice
> >> (as proven by a software project using it). Including scenarios how to
> deal
> >> with bigger amounts of data (i.e. beyond toy examples). My scenarios
> >> typically include >= 100 million triples. ;-)
> >>
> >> In addition to what Andy said about wrapper APIs, I would also like to
> >> emphasise the incurred memory and computation overhead of wrapper APIs.
> Not
> >> an issue if you have only a handful of triples, but a big issue when you
> >> have 100 million.
>
A wrapper doesn't means you have an in memory objects for all your triples
of your store, that's absurd. But if your code deals with some resources at
runtime these resource are represented by object instances which contain at
least a pointer to the resource located of the RAM. So the overhead of a
wrapper is linear to the amount of RAM the application would need anyway
and independent of the size of the triple store. Besides I would like to
compare possible APIs here, ideally the best API would be largely adopted
making wrapper superfluous. (I could also mention that the jena Model class
also wraps a Graph instance)


>
> > It's a common misconception to think that java sets are limited to 231-1
> > elements, but even that would be more than 100 millions. In the
> challenge I
> > didn't ask for time complexity, it would be fair to ask for that too if
> you
> > want to analyze scenarios with such big number of triples.
>
> It is a common misconception that just because you have a 64bit
> architecture you also have 2^64 bits of memory available. And it is a
> common misconception that in-memory data representation means you do not
> need to take into account storage structures like indexes. Even if you
> represent this amount of data in memory, you will run into the same problem.
>
> 95% of all RDF scenarios will require persistent storage. Selecting a
> scenario that does not take this into account is useless.
>

I don't know where your RAM fixation comes from. My usecases doesn't
mandate in memory storage in any way. The 2^31-1 misconception comes not
from 32bit architecture but from the fact that Set.size() is defined to
return an int value (i.e. a maximum of 2^31-1) but the API is clear that a
Set can be bigger than that.  And again other usecase are welcome, lets
look at how they can be implemented with different APIs, how elegant the
solutions are, what they runtime properties are and of course how relevant
the usecases are to find the most suitable API.

Cheers,
Reto

Re: Toy-Usecase challenge for comparing RDF APIs to wrap data (was Re: Future of Clerezza and Stanbol)

Reply via email to