Hello, I think that Sebastian Schaffert is looking at things from a large data set point of view while Reto wants to evaluate clear and efficient design.
Reto says: > Besides I would like to compare possible APIs here, ideally the best API > would be largely adopted > making wrapper superfluous. (I could also mention that the jena Model class > also wraps a Graph instance) So some sort of wrappers will be implemented. I think Reto is concerned with the well suitedness, that is that where > ... having a wrapper on these objects that makes them RDF graphs is the > first step to then allow processing with > the generic RDF tools and e.g. merging with other RDF data the object types remain available for evaluation before (and after?) insertion to the triple store (I suppose). Maybe that point touches on Sebastian's concerns? I think that Sebastian is concerned that such a design challenge does not lead to memory swamp and there are several reason for this. It is not just about large data sets using large amounts of memory if the design is wrong. It is also that other use cases require that objects be serialized early and efficiently. In RDBM ORM and cache, this is because another part of the system, the cache - perhaps caches, e.g. mem or ESI, is watching for changes. This is something else, of course. That there must be an UUID associated with the object (or triple) to facilitate this mechanism. Best, Adam On 13 November 2012 13:50, Reto Bachmann-Gmür <[email protected]> wrote: > On Tue, Nov 13, 2012 at 1:31 PM, Sebastian Schaffert < > [email protected]> wrote: > [...] > > > > > Despite the solution I described, I still do not think the scenario is > > well suited for evaluating RDF APIs. You also do not use Hibernate to > > evaluate whether an RDBMS is good or not. > > > The usecase I propose and I don't think this is the only one, I just think > that API comparison should be based on evaluating their suitability for > different concretely defined usecases. It has nothing to do with > hibernation neither with annotation based object to rdf property mapping > (as there have been several proposals). Its the same principle of any23 or > aperture but not on the binary data level but on the java object level. I > have my instrastructure that deals with graphs I have the a Set of contacts > how does the missing bit look like to process this set with my rdf > infrastructure. Its a reality that people don't (yet) have all their data > as graphs, they might have some contacts in LDAP and some mails on an Imap > server. > > > > >> > > >> If this is really an issue, I would suggest coming up with a bigger > > >> collection of RDF API usage scenarios that are also relevant in > practice > > >> (as proven by a software project using it). Including scenarios how to > > deal > > >> with bigger amounts of data (i.e. beyond toy examples). My scenarios > > >> typically include >= 100 million triples. ;-) > > >> > > >> In addition to what Andy said about wrapper APIs, I would also like to > > >> emphasise the incurred memory and computation overhead of wrapper > APIs. > > Not > > >> an issue if you have only a handful of triples, but a big issue when > you > > >> have 100 million. > > > A wrapper doesn't means you have an in memory objects for all your triples > of your store, that's absurd. But if your code deals with some resources at > runtime these resource are represented by object instances which contain at > least a pointer to the resource located of the RAM. So the overhead of a > wrapper is linear to the amount of RAM the application would need anyway > and independent of the size of the triple store. Besides I would like to > compare possible APIs here, ideally the best API would be largely adopted > making wrapper superfluous. (I could also mention that the jena Model class > also wraps a Graph instance) > > > > > > > It's a common misconception to think that java sets are limited to > 231-1 > > > elements, but even that would be more than 100 millions. In the > > challenge I > > > didn't ask for time complexity, it would be fair to ask for that too if > > you > > > want to analyze scenarios with such big number of triples. > > > > It is a common misconception that just because you have a 64bit > > architecture you also have 2^64 bits of memory available. And it is a > > common misconception that in-memory data representation means you do not > > need to take into account storage structures like indexes. Even if you > > represent this amount of data in memory, you will run into the same > problem. > > > > 95% of all RDF scenarios will require persistent storage. Selecting a > > scenario that does not take this into account is useless. > > > > I don't know where your RAM fixation comes from. My usecases doesn't > mandate in memory storage in any way. The 2^31-1 misconception comes not > from 32bit architecture but from the fact that Set.size() is defined to > return an int value (i.e. a maximum of 2^31-1) but the API is clear that a > Set can be bigger than that. And again other usecase are welcome, lets > look at how they can be implemented with different APIs, how elegant the > solutions are, what they runtime properties are and of course how relevant > the usecases are to find the most suitable API. > > Cheers, > Reto >
