On Tue, Mar 5, 2013 at 12:14 PM, Minto van der Sluis <[email protected]> wrote:
> Op 22-2-2013 19:32, Reto Bachmann-Gmür schreef: > > Hi Minto, > > > > I think making the support for large number of named graphs more > efficient > > should be feasible quite easily, so if you want to stick to the current > > design let's fix it. > How is this best fixed. I can think of 2 approaches to get rid of the > sets with graph names: > 1) Rely on Jena Dataset.containsNamedModel(). This however does not > discriminate between read-only/readwrite graphs. So something additional > is required here. > 2) Instead of the 2 sets use a single index graph to store the name and > types (read-only/readwrite) of named graphs. Something like: > <graphName> <graph.type> <read-only|readwrite> > Not sure why this would relevantly increase performance. Access to set is supposed to be fast. > > Probably also TcProvider methods like getNames(), listGraphs(), ... need > to return intelligent (lazy loading?) sets. > +1 > > Any thoughts or alternatives? > > But I'm not really sure why you need one graph per tree, why can't you > just > > follow the relations to extract a graph from a single graph? > I prefer multiple named graphs for a number of reasons: > 1) Isolation: We use ontologies derived from OpenAnnotatation (OA). We > not only store the annotation but also process related information along > with the annotation. Al this information together we call an annotation > tree. One of our requirements is to be able to retrieve a full isolated > annotation tree. > My feeling is that storing all annotation trees in a single graph will > lead to annotation tree interference when annotation trees show partial > overlap. For instance in OA terms when 2 trees have the same OA target > In our current solution every annotation tree is stored in a separate > named graph. Another separate graph is used as an index for all > annotation trees (comparable to what Stanbol RuleStore does with recipes). > 2) Ontology agnostic: We try to design our solution to be ontology > agnostic at the storage level. In our opinion the solution we currently > work on can be seen as a framework which can be used in multiple > situations. This is mostly in situations where the process around > annotations is slightly different. Some part definitely do need to know > about the ontology being used, but if possible we like to keep the > ontology away from lower level components. > 3) Performance: If feel like retrieving a full named graph performs > better than following relations. Of course performance optimizations > should be based on benchmarks and performance tests. But by using named > graphs I also rely on Clerezza and Jena to take care of performance > optimizations instead of coding my own relations walker. > Ok, I see the benefits. Especioally the one of the global index and the individual named graphs for faster retrieving. What you describe as ontology agnosticism I think is about the same as my criteria of different origin of the data. > > On the downside our queries probably are going to look more complex. The > queries need to take the named graphs and named graph index into account. > > In general I think that its a design smell if multiple graphs are used > for > > semantic reason. RDF is powerful to describe a universe in one graph. In > my > > opinion multiple graphs should be used if they either have different > > origins (i.e. the represent the universe from different observers) or > they > > have different access control setting (e.g. public/confidential/private). > > As there are many origins of graphs having support for large number of > > graphs in clerezza would still be nice. > I would like to stick with my current design unless there is something > fundamentally wrong with it. :-) > Good, then let's optimize the clerezza perfomance. I think the Sparql fastlane is an important step. Reto
