Re: [jena-dev] Building RDF Schema information from TDB Dataset [ ARQ, TDB ]

Paolo Castagna Fri, 07 Oct 2011 07:36:54 -0700


Dave Reynolds wrote:
> On Wed, 2011-10-05 at 11:22 +0100, Paolo Castagna wrote: 
>> Dave Reynolds wrote:
>>> If you just want to list the properties and classes that are used then
>>> you can do things like:
>>>
>>>   SELECT DISTINCT ?p WHERE {?s ?p ?o.}
>>>
>>>   SELECT DISTINCT ?cls WEHRE {?i a ?cls.}
>> Any idea to speed up these two queries (for large TDB datasets) is welcome! 
>> :-)
> 
> I nearly put a comment in that response that those can be very expensive
> queries :)


You should have put a comment in. :-)

The same person is back, with IMHO reasonable user needs to get
statistical information on how RDF properties and classes are used.

I see similar requests @ Talis all the time and SPARQL + COUNT queries
with large datasets cannot do magic... they need to scan through the
entire dataset.

An alternative, for people storing their raw RDF data in S3|HDFS in
N-Triples|N-Quads format, is to write very trivial MapReduce jobs
and do the counting there. However, this is batch processing and your
counts will not be updated in real-time.

I'd like to know what is the best way to intercept all updates so
that it is possible to keep an additional data structure | index
with up-to-date counts and statistical information.

Paolo

> 
> Dave
> 
>

Re: [jena-dev] Building RDF Schema information from TDB Dataset [ ARQ, TDB ]

Reply via email to