Re: recent changes in SPARQL Algebra or sparql.org?

Andy Seaborne Mon, 21 Nov 2011 01:30:59 -0800

On 21/11/11 02:21, Tim Harsch wrote:

My colleague ran into an issue and sent me the following observation.  I told 
him I would relay  it to the list.


##############
We have been seeing an inconsistency with the way multiple-expression FILTERs 
applied to an OPTIONAL clause are handled. At first, SPARQL queries with the 
following sort of structure,

…
WHERE {
…
     ?class3 rdfs:subClassOf foaf:Document .
     ?doc3 rdf:type ?class3 .
     ?doc3 dcterms:references ?bag3 .
     ?bag3 ?member3 ?doc
     OPTIONAL {
       ?class4 rdfs:subClassOf foaf:Document .
       ?doc4 rdf:type ?class4 .
       ?doc4 dcterms:references ?bag4 .
       ?bag4 ?member4 ?doc3
       FILTER  (!bound(?doc4))
       FILTER  (!bound(?bag4) )

>      }
> …

Observation: the usual idiom for negation in SPARQL 1.0 is to place theFILTER/!bound outside and after the OPTIONAL.


     OPTIONAL {
       ?class4 rdfs:subClassOf foaf:Document .
       ?doc4 rdf:type ?class4 .
       ?doc4 dcterms:references ?bag4 .
       ?bag4 ?member4 ?doc3
     }
     FILTER  (!bound(?doc4))
     FILTER  (!bound(?bag4) )

Inside, FILTER/!bound do not filter whether the optional happened ornot. ?doc4 and ?bag4 are bound by the OPTIONAL { BGP } and so are boundfor the LeftJoin condition.

(complete queries, with namespaces appreciated so they can becut-and-pasted in tools easily)


used to be translated, by SPARQLer Query Validator, into SPARQL Algebra that 
looked like the following:

…
(leftjoin
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class3 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 rdf:type ?class3)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 dcterms:references ?bag3)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag3 ?member3 ?doc)
           )
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class4 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 rdf:type ?class4)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 dcterms:references ?bag4)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag4 ?member4 ?doc3)
           )
           (exprlist (! (bound ?doc4)) (! (bound ?bag4))))))))

In the last couple of weeks, however, the “exprlist” operator never appeared, 
and instead we’d see a single, AND-ed expression:

…
(leftjoin
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class3 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 rdf:type ?class3)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 dcterms:references ?bag3)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag3 ?member3 ?doc)
           )
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class4 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 rdf:type ?class4)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 dcterms:references ?bag4)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag4 ?member4 ?doc3)
           )
                        (&&  (! (bound ?doc4)) (! (bound ?bag4))))))))


This happens if you write:

FILTER  (!bound(?doc4) && FILTER  (!bound(?bag4) )

That has the same effect but it is different query.

This is OK, since it represents how we have to treat the expressions

anyway, but it just worries us that we’re shooting at a moving target.
Note that recently, the “exprlist” construct has reappeared. Either
changes are being made to the SPARQL-to-SPARQL-Algebra translation, or
we’re missing some fine point of the SPARQL grammar. Either way, we need
to know what’s going on.

##############

The exprlist form looks right. I don't recall any changes being made inthis area, specifically, I don't recall any code that aggregatesexprlists into && expressions.


Which version and which tools are you running?

There is code to break up && into exprlists as an optimizer step - itenables the individual filter expressions to be placed more accuratelylater on.

The algebra output by the algebra generator (before the optimizer runs)should stable; algebra after optimization may change.

Different storage layers use different sets of optimization steps : SDBand TDB do different things even at the high level algebra rewritesbecause SDB tries to leave as much to the SQL optimizer as possible onthe theory that it knows best (this is only partially true!)

Your example has been fed through the algebra to quad form transformafter algebra generation - is that the only transform being applied?


        Andy



Thanks,
Tim

Re: recent changes in SPARQL Algebra or sparql.org?

Reply via email to