hi guys, so, I patched Jena 2.7.1 with the fix for https://issues.apache.org/jira/browse/JENA-283 and I was running again into query optimization regressions. So, I started debugging this and the problem we faced was caused by the occurrence of a UNION inside an OPTIONAL which blocked the filter from being pushed down because the code does not consider OpUnion operators and thus assumes it cannot optimize as soon as it finds this kind of operator
The way to fix this seems to be to change line 137 (on HEAD) if ( op instanceof OpConditional || op instanceof OpLeftJoin) into if ( op instanceof OpConditional || op instanceof OpLeftJoin || op instanceof OpUnion) but although I think this is safe, I would like to know from someone more experienced if this is really safe and wouldn't change the query semantics. More generally, I wonder why this particular condition cannot be changed to (op instanceof Op2) Here is a sample query which caused a problem: PREFIX pr1: <http://foo/bar> PREFIX pr2: <http://foo/baz> SELECT ?R1 ?R1_v1 ?R1_v2 WHERE { ?R1 pr1:type pr2:t1 FILTER ( ?R1 = <https://host/test/r1> || ?R1 = <https://host/test/r2> || ?R1 = <https://host/test/r3> || ?R1 = <https://host/test/r4> || ?R1 = <https://host/test/r5> || ?R1 = <https://host/test/r6> || ?R1 = <https://host/test/r7> || ?R1 = <https://host/test/r8> || ?R1 = <https://host/test/r9> || ?R1 = <https://host/test/r10> ) OPTIONAL { { ?R1 pr1:identifier ?R1_v1 } UNION { ?R1 pr2:boundArtifact ?R1_v2 } } } Without the fix, I got the following plan: (prefix ((pr2: <http://foo/baz>) (pr1: <http://foo/bar>)) (project (?R1 ?R1_v1 ?R1_v2) (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?R1 < https://host/test/r1>) (= ?R1 <https://host/test/r2>)) (= ?R1 < https://host/test/r3>)) (= ?R1 <https://host/test/r4>)) (= ?R1 < https://host/test/r5>)) (= ?R1 <https://host/test/r6>)) (= ?R1 < https://host/test/r7>)) (= ?R1 <https://host/test/r8>)) (= ?R1 < https://host/test/r9>)) (= ?R1 <https://host/test/r10>)) (conditional (bgp (triple ?R1 pr1:type pr2:t1)) (union (bgp (triple ?R1 pr1:identifier ?R1_v1)) (bgp (triple ?R1 pr2:boundArtifact ?R1_v2))))))) With the fix, I got the following plan, which is what we expect (the performance difference on a large repo is massive, i.e. a few hundred ms versus 3 minutes !) (prefix ((pr2: <http://foo/baz>) (pr1: <http://foo/bar>)) (project (?R1 ?R1_v1 ?R1_v2) (disjunction (assign ((?R1 <https://host/test/r1>)) (conditional (bgp (triple <https://host/test/r1> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r1> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r1> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r2>)) (conditional (bgp (triple <https://host/test/r2> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r2> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r2> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r3>)) (conditional (bgp (triple <https://host/test/r3> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r3> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r3> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r4>)) (conditional (bgp (triple <https://host/test/r4> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r4> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r4> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r5>)) (conditional (bgp (triple <https://host/test/r5> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r5> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r5> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r6>)) (conditional (bgp (triple <https://host/test/r6> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r6> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r6> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r7>)) (conditional (bgp (triple <https://host/test/r7> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r7> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r7> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r8>)) (conditional (bgp (triple <https://host/test/r8> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r8> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r8> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r9>)) (conditional (bgp (triple <https://host/test/r9> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r9> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r9> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r10>)) (conditional (bgp (triple <https://host/test/r10> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r10> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r10> pr2:boundArtifact ?R1_v2)))))))) Let me know if this is a real issue so I can open an issue with the patch so it can be fixed for 2.7.4 thanks Simon