On 02/12/2021 18:32, Rob Vesse wrote:
Well that's what I understood except that isn't what's implemented in the code. 
 My sample query on the sample dataset should have failed with a query error if 
that was the case.

Following the code OpExecutor executes OpExtend by creating a QueryIterAssign 
and it sets the constructor parameter mustBeNewVar to true 
(https://github.com/apache/jena/blob/1cdaa70edd61c2fbba5777e8b2ec7bcaf90b7c75/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/OpExecutor.java#L440-L446)

BUT the implementation code at 
https://github.com/apache/jena/blob/31dc0d328c4858401e5d3fa99702c97eba0383a0/jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterAssign.java#L73-L74
 checks that flag in a false && mustBeNewVar condition so never enforces that 
requirement.  Thus it continues onto the sameValueAs check at line 76 effectively 
meaning OpExtend and OpAssign are evaluated identically right now

It should be harmless. (extend) is static checks, so at runtime it can be (assign).

OpExecutor

    // We know (parse time checking) the variable is unused so far in
    // the query so we can use QueryIterAssign knowing that it behaves
    // the same as extend. The boolean should only be a check.

and QueryIterAssign

    // Optimization may linearize to push a stream through an (extend).

We can change it to an (assign) if you want.

Is there an example that gets the wrong answer at any time?
(Is this an observable bug? Do we need to hold up 4.3.0?)

    Andy


Rob

On 02/12/2021, 18:13, "Andy Seaborne" <a...@apache.org> wrote:



     On 02/12/2021 09:31, Rob Vesse wrote:
     > Yes I did play around with some other forms and observed that the older 
assign form is still used for more complex forms
     >
     > I think the other confusion for me was that historically OpExtend 
evaluation would fail if the assignment variable already existed, but again 
digging into the code it looks like that restriction is no longer enforced.

     We have
     (extend) = BIND = OpExtend
     (assign) = LET(an extension) = OpAssign

     Still true that BIND will not allow redefinition nor reassign same
     value. It is a static condition done at algebra generation and static
     optimization.

     (assign) will allow redefinition if it is the same term (it is like
     FILTER sameTerm).

          Andy

     >
     > Rob
     >
     > On 01/12/2021, 18:27, "Andy Seaborne" <a...@apache.org> wrote:
     >
     >      4.2.0 change.
     >
     >      https://issues.apache.org/jira/browse/JENA-2150
     >
     >      OpVars now processes BIND variables (an imporvement) and so 
optimizer
     >      does generate the 4.2.0 form because it is a BIND.
     >
     >      Different patterns have different forms:
     >
     >      SELECT * {
     >         GRAPH ?g {
     >           ?s ?p ?o .
     >           FILTER ( <http://graphs/1> = ?g )
     >         }
     >      }
     >
     >      ==>
     >
     >      (assign ((?g ?*g0))
     >         (filter (= <http://graphs/1> ?g)
     >           (quadpattern (quad ?*g0 ?s ?p ?o))))
     >
     >      because ?g is not in scope in the FILTER.
     >
     >           Andy
     >
     >      On 01/12/2021 11:20, Rob Vesse wrote:
     >      > Hey Folks
     >      >
     >      >
     >      >
     >      > So I’ve been dragging some old code back up to date with Jena 4 
and noticed an interesting behavioural change around quad form algebra generation and 
optimization that kinda took me aback when I first found it.  I think the algebras 
are semantically equivalent but at initial glance I wondered whether there is a 
potential scoping issue here.
     >      >
     >      >
     >      >
     >      > Consider the following trivial dataset:
     >      >
     >      >
     >      >
     >      > <http://a> <http://b> <http://c> <http://graphs/1> .
     >      >
     >      > <http://d> <http://e> <http://c> <http://graphs/2> .
     >      >
     >      >
     >      >
     >      > And the following query:
     >      >
     >      >
     >      >
     >      > SELECT *
     >      >
     >      > WHERE
     >      >
     >      > {
     >      >
     >      >    GRAPH ?g {
     >      >
     >      >      ?s ?p ?o .
     >      >
     >      >      BIND(<http://graphs/1> AS ?g)
     >      >
     >      >    }
     >      >
     >      > }
     >      >
     >      >
     >      >
     >      > Obviously this is a somewhat odd query but it’s a simplification 
of the more general pattern of calculating the desired graph name inside of a GRAPH 
?var block in order to produce only results from some subset of graphs
     >      >
     >      >
     >      >
     >      > When translate into quads form with Jena 3.x I get the following:
     >      >
     >      >
     >      >
     >      > (assign ((?g ?*g0))
     >      >
     >      >    (extend ((?g <http://graphs/1>))
     >      >
     >      >      (quadpattern (quad ?*g0 ?s ?p ?o))))
     >      >
     >      >
     >      >
     >      > But with Jena 4.x I get this instead:
     >      >
     >      >
     >      >
     >      > (extend ((?g <http://graphs/1>))
     >      >
     >      >    (quadpattern (quad ?g ?s ?p ?o)))
     >      >
     >      >
     >      >
     >      > Note that in 3.x it rewrites the graph node in the inner quad 
pattern and uses assign to filter those after the extend but in 4.x it does not 
rewrite the graph node.  While semantically these appear equivalent, and loading the 
test data into TDB 2 and uses tdb2.tdbquery to run a quad mode execution yields the 
same results in both cases, I do wonder if there’s a potential corner case here where 
scoping gets screwed up on more complex queries?
     >      >
     >      >
     >      >
     >      > It seems in general that Jena 4.x less aggressively rewrites the 
graph name when translating OpGraph into quads form and maybe that’s perfectly fine 
but just wanted to check if this was an intended change and whether anybody else had 
encountered this.
     >      >
     >      >
     >      >
     >      > For example rewriting the query to put the BIND first inside the 
GRAPH clause yields the following algebra on 3.x:
     >      >
     >      >
     >      >
     >      > (assign ((?g ?*g0))
     >      >
     >      >    (sequence
     >      >
     >      >      (extend ((?g <http://graphs/1>))
     >      >
     >      >        (table unit))
     >      >
     >      >      (quadpattern (quad ?*g0 ?s ?p ?o))))
     >      >
     >      >
     >      >
     >      > And on 4.x:
     >      >
     >      >
     >      >
     >      > (sequence
     >      >
     >      >    (extend ((?g <http://graphs/1>))
     >      >
     >      >      (table unit))
     >      >
     >      >    (quadpattern (quad ?g ?s ?p ?o)))
     >      >
     >      >
     >      >
     >      > So again I think semantically equivalent but not rewriting the 
graph name.
     >      >
     >      >
     >      >
     >      > Any thoughts/opinions on this?
     >      >
     >      >
     >      >
     >      > Cheers,
     >      >
     >      >
     >      >
     >      > Rob
     >      >
     >      >
     >
     >
     >
     >




Reply via email to