Maybe a materialized view [1] would be the solution here? It could maintain caching for these types of queries without having to have a whole separate index. Since ontology information is probably rarely updated, it would seem like a good fit.
-Stephen [1] http://en.wikipedia.org/wiki/Materialized_view > -----Original Message----- > From: Paolo Castagna [mailto:[email protected]] > Sent: Wednesday, October 05, 2011 7:16 AM > To: [email protected] > Subject: Re: [jena-dev] Building RDF Schema information from TDB > Dataset [ ARQ, TDB ] > > Dave Reynolds wrote: > > On Wed, 2011-10-05 at 11:22 +0100, Paolo Castagna wrote: > >> Dave Reynolds wrote: > >>> If you just want to list the properties and classes that are used > then > >>> you can do things like: > >>> > >>> SELECT DISTINCT ?p WHERE {?s ?p ?o.} > >>> > >>> SELECT DISTINCT ?cls WEHRE {?i a ?cls.} > >> Any idea to speed up these two queries (for large TDB datasets) is > welcome! :-) > > > > I nearly put a comment in that response that those can be very > expensive > > queries :) > > :-) > > The fact is that often people do not put vocabularies|ontologies in > their data > (rightly so). They also want to have a list of properties and classes > actually > used in a dataset, and they want counts (i.e. how many times a property > or a > class has been actually used in a dataset). > > The list of properties (with counts) can be derived from the stats.opt > file > (if present). > > Maybe people wanting to have super fast list of distinct > properties|classes > actually used in a dataset should have a custom index just for that. > However, one would need to intercept all updates operation and keep > those > indexes in sync with TDB indexes. I have no idea on what is the best > way to > do that. > > Are there better ideas? > > Paolo > > > Dave > > > >
