Op 22-2-2013 19:32, Reto Bachmann-Gmür schreef:
> Hi Minto,
>
> I think making the support for large number of named graphs more efficient
> should be feasible quite easily, so if you want to stick to the current
> design let's fix it.
How is this best fixed. I can think of 2 approaches to get rid of the
sets with graph names:
1) Rely on Jena Dataset.containsNamedModel(). This however does not
discriminate between read-only/readwrite graphs. So something additional
is required here.
2) Instead of the 2 sets use a single index graph to store the name and
types (read-only/readwrite) of named graphs. Something like:
<graphName> <graph.type> <read-only|readwrite>
Probably also TcProvider methods like getNames(), listGraphs(), ... need
to return intelligent (lazy loading?) sets.
Any thoughts or alternatives?
> But I'm not really sure why you need one graph per tree, why can't you just
> follow the relations to extract a graph from a single graph?
I prefer multiple named graphs for a number of reasons:
1) Isolation: We use ontologies derived from OpenAnnotatation (OA). We
not only store the annotation but also process related information along
with the annotation. Al this information together we call an annotation
tree. One of our requirements is to be able to retrieve a full isolated
annotation tree.
My feeling is that storing all annotation trees in a single graph will
lead to annotation tree interference when annotation trees show partial
overlap. For instance in OA terms when 2 trees have the same OA target
In our current solution every annotation tree is stored in a separate
named graph. Another separate graph is used as an index for all
annotation trees (comparable to what Stanbol RuleStore does with recipes).
2) Ontology agnostic: We try to design our solution to be ontology
agnostic at the storage level. In our opinion the solution we currently
work on can be seen as a framework which can be used in multiple
situations. This is mostly in situations where the process around
annotations is slightly different. Some part definitely do need to know
about the ontology being used, but if possible we like to keep the
ontology away from lower level components.
3) Performance: If feel like retrieving a full named graph performs
better than following relations. Of course performance optimizations
should be based on benchmarks and performance tests. But by using named
graphs I also rely on Clerezza and Jena to take care of performance
optimizations instead of coding my own relations walker.
On the downside our queries probably are going to look more complex. The
queries need to take the named graphs and named graph index into account.
> In general I think that its a design smell if multiple graphs are used for
> semantic reason. RDF is powerful to describe a universe in one graph. In my
> opinion multiple graphs should be used if they either have different
> origins (i.e. the represent the universe from different observers) or they
> have different access control setting (e.g. public/confidential/private).
> As there are many origins of graphs having support for large number of
> graphs in clerezza would still be nice.
I would like to stick with my current design unless there is something
fundamentally wrong with it. :-)
> Cheers,
> Reto
>
--
ir. ing. Minto van der Sluis
Software innovator / renovator
Xup BV
Mobiel: +31 (0) 626 014541