On 21/11/11 02:21, Tim Harsch wrote:
My colleague ran into an issue and sent me the following observation.  I told 
him I would relay  it to the list.

##############
We have been seeing an inconsistency with the way multiple-expression FILTERs 
applied to an OPTIONAL clause are handled. At first, SPARQL queries with the 
following sort of structure,

…
WHERE {
…
     ?class3 rdfs:subClassOf foaf:Document .
     ?doc3 rdf:type ?class3 .
     ?doc3 dcterms:references ?bag3 .
     ?bag3 ?member3 ?doc
     OPTIONAL {
       ?class4 rdfs:subClassOf foaf:Document .
       ?doc4 rdf:type ?class4 .
       ?doc4 dcterms:references ?bag4 .
       ?bag4 ?member4 ?doc3
       FILTER  (!bound(?doc4))
       FILTER  (!bound(?bag4) )
>      }
> …

Observation: the usual idiom for negation in SPARQL 1.0 is to place the FILTER/!bound outside and after the OPTIONAL.

     OPTIONAL {
       ?class4 rdfs:subClassOf foaf:Document .
       ?doc4 rdf:type ?class4 .
       ?doc4 dcterms:references ?bag4 .
       ?bag4 ?member4 ?doc3
     }
     FILTER  (!bound(?doc4))
     FILTER  (!bound(?bag4) )


Inside, FILTER/!bound do not filter whether the optional happened or not. ?doc4 and ?bag4 are bound by the OPTIONAL { BGP } and so are bound for the LeftJoin condition.

(complete queries, with namespaces appreciated so they can be cut-and-pasted in tools easily)


used to be translated, by SPARQLer Query Validator, into SPARQL Algebra that 
looked like the following:

…
(leftjoin
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class3 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 rdf:type ?class3)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 dcterms:references ?bag3)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag3 ?member3 ?doc)
           )
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class4 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 rdf:type ?class4)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 dcterms:references ?bag4)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag4 ?member4 ?doc3)
           )
           (exprlist (! (bound ?doc4)) (! (bound ?bag4))))))))

In the last couple of weeks, however, the “exprlist” operator never appeared, 
and instead we’d see a single, AND-ed expression:

…
(leftjoin
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class3 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 rdf:type ?class3)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc3 dcterms:references ?bag3)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag3 ?member3 ?doc)
           )
           (quadpattern
             (quad<urn:x-arq:DefaultGraphNode>  ?class4 rdfs:subClassOf 
foaf:Document)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 rdf:type ?class4)
             (quad<urn:x-arq:DefaultGraphNode>  ?doc4 dcterms:references ?bag4)
             (quad<urn:x-arq:DefaultGraphNode>  ?bag4 ?member4 ?doc3)
           )
                        (&&  (! (bound ?doc4)) (! (bound ?bag4))))))))

This happens if you write:

FILTER  (!bound(?doc4) && FILTER  (!bound(?bag4) )

That has the same effect but it is different query.

This is OK, since it represents how we have to treat the expressions
anyway, but it just worries us that we’re shooting at a moving target.
Note that recently, the “exprlist” construct has reappeared. Either
changes are being made to the SPARQL-to-SPARQL-Algebra translation, or
we’re missing some fine point of the SPARQL grammar. Either way, we need
to know what’s going on.
##############

The exprlist form looks right. I don't recall any changes being made in this area, specifically, I don't recall any code that aggregates exprlists into && expressions.

Which version and which tools are you running?

There is code to break up && into exprlists as an optimizer step - it enables the individual filter expressions to be placed more accurately later on.

The algebra output by the algebra generator (before the optimizer runs) should stable; algebra after optimization may change.

Different storage layers use different sets of optimization steps : SDB and TDB do different things even at the high level algebra rewrites because SDB tries to leave as much to the SQL optimizer as possible on the theory that it knows best (this is only partially true!)

Your example has been fed through the algebra to quad form transform after algebra generation - is that the only transform being applied?

        Andy



Thanks,
Tim

Reply via email to