Re: Distinct graphs

Sarven Capadisli Fri, 09 Mar 2012 06:40:25 -0800

On 12-03-09 02:49 AM, Paolo Castagna wrote:

Sarven Capadisli wrote:

On 12-03-08 02:47 PM, Paolo Castagna wrote:

Rob Vesse wrote:

Yes one possibility that me and Andy raised in that discussion was the
use of the following:


SELECT DISTINCT ?g WHERE { GRAPH ?g { } }

Since GRAPH ?g is defined as an iteration over all graphs in the dataset
(which may of course be modified by the presence of FROM and FROM NAMED)
and the empty graph pattern returns a single empty solution (i.e. always
matches) then on paper at least this query should do the same job and be
much more performant.  Whether this query works may vary depending on
how accurately an engine actually implements the SPARQL spec because the
whole dataset/GRAPH interaction is one of the areas prone to ambiguities
in the spec and differences of opinion between implementers


Indeed, the optimization might already be there... Sarven, could you
try to see
if SELECT DISTINCT ?g { GRAPH ?g { } } gives you what you want, faster?


First of all, that worked! It took about 10-15 minutes the first time I
tried it. I just ran it again.. and 30 minutes in, still waiting for a
response. Odd.


Hi Sarven,
that is too slow for any UI interaction, I suggest you try the other approach,
you could take the opportunity to use the VoID vocabulary and/or the SPARQL 1.1
Service Description to add triples which describe your data.

This way you can make the { ?s ?p ?o } more selective and search for:
{ ?s a void:Dataset } or { ?s a sd:Dataset } or { ?s a sd:Graph }.

Have a look here:

  - http://www.w3.org/TR/void/
  - http://www.w3.org/TR/sparql11-service-description/

I will have the VoID+SG in any case in the store, however, the simplestof the queries is the one that we are trying to speed up i.e., SELECTDISTINCT ?g WHERE { GRAPH { ?s ?p ?o. } } I imagine to be most widelyused, followed by SELECT DISTINCT ?g { GRAPH ?g { } }. Of course theconsumer that's aware of VoID+SG's presence by way of/.well-known/void.ttl will use it, however it will escape the rest. And,actual queries reveal what's really in the store, and more reliable incomparison to to some statements making the claim.

Are we ultimately facing the issue where as the store gets larger,getting the list of graphs becomes more difficult?

There is a way to add the Graph names in TDB assembler. Can this help inany way with the queries?

Out of curiosity, how big are your GSPO.dat and GSPO.idn files in the TDB
directory? To answer you query, TDB needs to scan through all that index.
While with { ?s a void:Dataset } will need to scan through only a small
fraction of the POSG index, I suppose.


GSPO.dat 23983030272 bytes
GSPO.idn 293601280 bytes

-Sarven

Re: Distinct graphs

Reply via email to