On Tue, Nov 13, 2012 at 1:31 PM, Sebastian Schaffert < [email protected]> wrote: [...]
> > Despite the solution I described, I still do not think the scenario is > well suited for evaluating RDF APIs. You also do not use Hibernate to > evaluate whether an RDBMS is good or not. > The usecase I propose and I don't think this is the only one, I just think that API comparison should be based on evaluating their suitability for different concretely defined usecases. It has nothing to do with hibernation neither with annotation based object to rdf property mapping (as there have been several proposals). Its the same principle of any23 or aperture but not on the binary data level but on the java object level. I have my instrastructure that deals with graphs I have the a Set of contacts how does the missing bit look like to process this set with my rdf infrastructure. Its a reality that people don't (yet) have all their data as graphs, they might have some contacts in LDAP and some mails on an Imap server. > >> > >> If this is really an issue, I would suggest coming up with a bigger > >> collection of RDF API usage scenarios that are also relevant in practice > >> (as proven by a software project using it). Including scenarios how to > deal > >> with bigger amounts of data (i.e. beyond toy examples). My scenarios > >> typically include >= 100 million triples. ;-) > >> > >> In addition to what Andy said about wrapper APIs, I would also like to > >> emphasise the incurred memory and computation overhead of wrapper APIs. > Not > >> an issue if you have only a handful of triples, but a big issue when you > >> have 100 million. > A wrapper doesn't means you have an in memory objects for all your triples of your store, that's absurd. But if your code deals with some resources at runtime these resource are represented by object instances which contain at least a pointer to the resource located of the RAM. So the overhead of a wrapper is linear to the amount of RAM the application would need anyway and independent of the size of the triple store. Besides I would like to compare possible APIs here, ideally the best API would be largely adopted making wrapper superfluous. (I could also mention that the jena Model class also wraps a Graph instance) > > > It's a common misconception to think that java sets are limited to 231-1 > > elements, but even that would be more than 100 millions. In the > challenge I > > didn't ask for time complexity, it would be fair to ask for that too if > you > > want to analyze scenarios with such big number of triples. > > It is a common misconception that just because you have a 64bit > architecture you also have 2^64 bits of memory available. And it is a > common misconception that in-memory data representation means you do not > need to take into account storage structures like indexes. Even if you > represent this amount of data in memory, you will run into the same problem. > > 95% of all RDF scenarios will require persistent storage. Selecting a > scenario that does not take this into account is useless. > I don't know where your RAM fixation comes from. My usecases doesn't mandate in memory storage in any way. The 2^31-1 misconception comes not from 32bit architecture but from the fact that Set.size() is defined to return an int value (i.e. a maximum of 2^31-1) but the API is clear that a Set can be bigger than that. And again other usecase are welcome, lets look at how they can be implemented with different APIs, how elegant the solutions are, what they runtime properties are and of course how relevant the usecases are to find the most suitable API. Cheers, Reto
