Yes I agree that it looks safe semantically and I certainly haven't see any queries that break on this
Only spotted this because we upgraded some old code to Jena 4.x and the underlying engine there requires strict extend semantics so we were rewriting assigns to a combination of filters and unions depending on whether the assigned variable may/may not be bound. One of our optimizer test cases specifically covered this graph case and was failing because Jena wasn't rewriting that particular form to assign anymore. So more a case where our custom optimizer has to do some extra work to avoid sending the backend an extend it's going to choke on. Rob On 05/12/2021, 11:43, "Andy Seaborne" <a...@apache.org> wrote: I can't find any problems. What is ununusual is that the GRAPH to quads transform introduces a variable in the quads which was used further out (the GRAPH ?g). And ?g is not used in a filter or triple pattern -those caes do cause the temporary variable + rename to happen. Because "GRAPH ?g" is a specific a specific operation - "join the inner pattern result with the constant one column table of graph names, using join variable ?g", then pushing the effect of this inwards is through BIND is safe, I believe, even if the BIND is now reassigning due to the introduced ?g. When the quad match ?g is the same, it passes the (extend-noscope-check) which is assign. When the quad match ?g is the different, it does not passes the (extend-noscope-check). Rewriting to assign is possible but not simple because transformation is bottom up so if rewrite is necessary, an extra subtree walk is needed as the BIND may be deeply nested. See also https://afs.github.io/substitute.html about injecting var-value bindings, safely. Andy On 03/12/2021 15:37, Andy Seaborne wrote: > > > On 02/12/2021 18:32, Rob Vesse wrote: >> Well that's what I understood except that isn't what's implemented in >> the code. My sample query on the sample dataset should have failed >> with a query error if that was the case. >> >> Following the code OpExecutor executes OpExtend by creating a >> QueryIterAssign and it sets the constructor parameter mustBeNewVar to >> true >> (https://github.com/apache/jena/blob/1cdaa70edd61c2fbba5777e8b2ec7bcaf90b7c75/jena-arq/src/main/java/org/apache/jena/sparql/engine/main/OpExecutor.java#L440-L446) >> >> >> BUT the implementation code at >> https://github.com/apache/jena/blob/31dc0d328c4858401e5d3fa99702c97eba0383a0/jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterAssign.java#L73-L74 >> checks that flag in a false && mustBeNewVar condition so never >> enforces that requirement. Thus it continues onto the sameValueAs >> check at line 76 effectively meaning OpExtend and OpAssign are >> evaluated identically right now > > It should be harmless. (extend) is static checks, so at runtime it can > be (assign). > > OpExecutor > > // We know (parse time checking) the variable is unused so far in > // the query so we can use QueryIterAssign knowing that it behaves > // the same as extend. The boolean should only be a check. > > and QueryIterAssign > > // Optimization may linearize to push a stream through an (extend). > > We can change it to an (assign) if you want. > > Is there an example that gets the wrong answer at any time? > (Is this an observable bug? Do we need to hold up 4.3.0?) > > Andy > >> >> Rob >> >> On 02/12/2021, 18:13, "Andy Seaborne" <a...@apache.org> wrote: >> >> >> >> On 02/12/2021 09:31, Rob Vesse wrote: >> > Yes I did play around with some other forms and observed that >> the older assign form is still used for more complex forms >> > >> > I think the other confusion for me was that historically >> OpExtend evaluation would fail if the assignment variable already >> existed, but again digging into the code it looks like that >> restriction is no longer enforced. >> >> We have >> (extend) = BIND = OpExtend >> (assign) = LET(an extension) = OpAssign >> >> Still true that BIND will not allow redefinition nor reassign same >> value. It is a static condition done at algebra generation and >> static >> optimization. >> >> (assign) will allow redefinition if it is the same term (it is like >> FILTER sameTerm). >> >> Andy >> >> > >> > Rob >> > >> > On 01/12/2021, 18:27, "Andy Seaborne" <a...@apache.org> wrote: >> > >> > 4.2.0 change. >> > >> > https://issues.apache.org/jira/browse/JENA-2150 >> > >> > OpVars now processes BIND variables (an imporvement) and >> so optimizer >> > does generate the 4.2.0 form because it is a BIND. >> > >> > Different patterns have different forms: >> > >> > SELECT * { >> > GRAPH ?g { >> > ?s ?p ?o . >> > FILTER ( <http://graphs/1> = ?g ) >> > } >> > } >> > >> > ==> >> > >> > (assign ((?g ?*g0)) >> > (filter (= <http://graphs/1> ?g) >> > (quadpattern (quad ?*g0 ?s ?p ?o)))) >> > >> > because ?g is not in scope in the FILTER. >> > >> > Andy >> > >> > On 01/12/2021 11:20, Rob Vesse wrote: >> > > Hey Folks >> > > >> > > >> > > >> > > So I’ve been dragging some old code back up to date with >> Jena 4 and noticed an interesting behavioural change around quad form >> algebra generation and optimization that kinda took me aback when I >> first found it. I think the algebras are semantically equivalent but >> at initial glance I wondered whether there is a potential scoping >> issue here. >> > > >> > > >> > > >> > > Consider the following trivial dataset: >> > > >> > > >> > > >> > > <http://a> <http://b> <http://c> <http://graphs/1> . >> > > >> > > <http://d> <http://e> <http://c> <http://graphs/2> . >> > > >> > > >> > > >> > > And the following query: >> > > >> > > >> > > >> > > SELECT * >> > > >> > > WHERE >> > > >> > > { >> > > >> > > GRAPH ?g { >> > > >> > > ?s ?p ?o . >> > > >> > > BIND(<http://graphs/1> AS ?g) >> > > >> > > } >> > > >> > > } >> > > >> > > >> > > >> > > Obviously this is a somewhat odd query but it’s a >> simplification of the more general pattern of calculating the desired >> graph name inside of a GRAPH ?var block in order to produce only >> results from some subset of graphs >> > > >> > > >> > > >> > > When translate into quads form with Jena 3.x I get the >> following: >> > > >> > > >> > > >> > > (assign ((?g ?*g0)) >> > > >> > > (extend ((?g <http://graphs/1>)) >> > > >> > > (quadpattern (quad ?*g0 ?s ?p ?o)))) >> > > >> > > >> > > >> > > But with Jena 4.x I get this instead: >> > > >> > > >> > > >> > > (extend ((?g <http://graphs/1>)) >> > > >> > > (quadpattern (quad ?g ?s ?p ?o))) >> > > >> > > >> > > >> > > Note that in 3.x it rewrites the graph node in the inner >> quad pattern and uses assign to filter those after the extend but in >> 4.x it does not rewrite the graph node. While semantically these >> appear equivalent, and loading the test data into TDB 2 and uses >> tdb2.tdbquery to run a quad mode execution yields the same results in >> both cases, I do wonder if there’s a potential corner case here where >> scoping gets screwed up on more complex queries? >> > > >> > > >> > > >> > > It seems in general that Jena 4.x less aggressively >> rewrites the graph name when translating OpGraph into quads form and >> maybe that’s perfectly fine but just wanted to check if this was an >> intended change and whether anybody else had encountered this. >> > > >> > > >> > > >> > > For example rewriting the query to put the BIND first >> inside the GRAPH clause yields the following algebra on 3.x: >> > > >> > > >> > > >> > > (assign ((?g ?*g0)) >> > > >> > > (sequence >> > > >> > > (extend ((?g <http://graphs/1>)) >> > > >> > > (table unit)) >> > > >> > > (quadpattern (quad ?*g0 ?s ?p ?o)))) >> > > >> > > >> > > >> > > And on 4.x: >> > > >> > > >> > > >> > > (sequence >> > > >> > > (extend ((?g <http://graphs/1>)) >> > > >> > > (table unit)) >> > > >> > > (quadpattern (quad ?g ?s ?p ?o))) >> > > >> > > >> > > >> > > So again I think semantically equivalent but not >> rewriting the graph name. >> > > >> > > >> > > >> > > Any thoughts/opinions on this? >> > > >> > > >> > > >> > > Cheers, >> > > >> > > >> > > >> > > Rob >> > > >> > > >> > >> > >> > >> > >> >> >> >>