Dave Reynolds wrote:
> On Wed, 2011-10-05 at 11:22 +0100, Paolo Castagna wrote:
>> Dave Reynolds wrote:
>>> If you just want to list the properties and classes that are used then
>>> you can do things like:
>>>
>>> SELECT DISTINCT ?p WHERE {?s ?p ?o.}
>>>
>>> SELECT DISTINCT ?cls WEHRE {?i a ?cls.}
>> Any idea to speed up these two queries (for large TDB datasets) is welcome!
>> :-)
>
> I nearly put a comment in that response that those can be very expensive
> queries :)
You should have put a comment in. :-)
The same person is back, with IMHO reasonable user needs to get
statistical information on how RDF properties and classes are used.
I see similar requests @ Talis all the time and SPARQL + COUNT queries
with large datasets cannot do magic... they need to scan through the
entire dataset.
An alternative, for people storing their raw RDF data in S3|HDFS in
N-Triples|N-Quads format, is to write very trivial MapReduce jobs
and do the counting there. However, this is batch processing and your
counts will not be updated in real-time.
I'd like to know what is the best way to intercept all updates so
that it is possible to keep an additional data structure | index
with up-to-date counts and statistical information.
Paolo
>
> Dave
>
>