Hi folks, I am starting to wonder if I use Clerezza and semantic technology in general in the wrong way. To make it more clear I will first describe my situation and then why I think I might be using it incorrectly.
Context ======= Basically we are gathering and distributing annotations. For this we make use of OpenAnnotation (OA, see [1]). Since OA is based on RDF we were looking for products capable of storing this data. We decided to use Clerezza as an abstraction for the actual storage layer. Like this we are able can switch storage engines quite easily. Now it turns out that our annotations should support annotations on annotations. Amongst others this is to be able to tell if a root annotation has been properly processed or rejected (status change). This leads us to the notion of annotation trees. Every one of these trees starts with a single annotation as the root/trunk. The system we work on not only stores annotation but also has to return complete annotation trees. For this reason we decided to store every tree in its own named graphs. Like this we can easily retrieve a full tree by returning the complete named graph. The downside of it well be that we will end up with a massive number of (small) named graphs. For the storage we decide (for the time being) to use SingleTdbDatasetTcProvider. Here also lies the root cause why I started wondering if we are on the right track. Looking at the SingleTdbDatasetTcProvider implementation I have the following observations: Observations =========== 1) SingleTdbDatasetTcProvider keeps names of graphs in 2 separate sets. This does not seem to be very efficient for large amounts of graphnames (100k+ or possible 1m+). private Set<UriRef> graphNames; private Set<UriRef> mGraphNames; 2) All graphnames are logged on startup (activation). This is feasible for a small number, but not for a rather large number of named graphs. 3) FileTcProvider (rdf.file.storage) also keeps names in memory. private Map<UriRef, FileMGraph> uriRef2MGraphMap = new HashMap<UriRef, FileMGraph>(); 4) SesameNativeWeightedProvider () keeps not only the names in memory, but the graph objects as well. private HashMap<UriRef, SesameMGraph> mGraphs; private HashMap<UriRef, SesameGraph> graphs; Are we approaching this incorrectly or are we running into limitations of the current implementation? In other words is a large number of named graphs supported or isn't Clerezza and maybe even semantic technology in general designed for this? Any thoughts? Regards, Minto -- ir. ing. Minto van der Sluis Software innovator / renovator Xup BV [1] http://www.openannotation.org/