Actually, what I suggested is not quite correct it. It should be line 130 which should be changed from
if ( op instanceof OpJoin ) to if ( op instanceof OpJoin || op instanceof OpUnion) since there is no special requirement on unions as far as I can see (so line 137 should be left alone) Simon From: Simon Helsen/Toronto/IBM@IBMCA To: dev@jena.apache.org Date: 08/10/2012 05:12 PM Subject: TransformFilterEquality hi guys, so, I patched Jena 2.7.1 with the fix for https://issues.apache.org/jira/browse/JENA-283 and I was running again into query optimization regressions. So, I started debugging this and the problem we faced was caused by the occurrence of a UNION inside an OPTIONAL which blocked the filter from being pushed down because the code does not consider OpUnion operators and thus assumes it cannot optimize as soon as it finds this kind of operator The way to fix this seems to be to change line 137 (on HEAD) if ( op instanceof OpConditional || op instanceof OpLeftJoin) into if ( op instanceof OpConditional || op instanceof OpLeftJoin || op instanceof OpUnion) but although I think this is safe, I would like to know from someone more experienced if this is really safe and wouldn't change the query semantics. More generally, I wonder why this particular condition cannot be changed to (op instanceof Op2) Here is a sample query which caused a problem: PREFIX pr1: <http://foo/bar> PREFIX pr2: <http://foo/baz> SELECT ?R1 ?R1_v1 ?R1_v2 WHERE { ?R1 pr1:type pr2:t1 FILTER ( ?R1 = <https://host/test/r1> || ?R1 = <https://host/test/r2> || ?R1 = <https://host/test/r3> || ?R1 = <https://host/test/r4> || ?R1 = <https://host/test/r5> || ?R1 = <https://host/test/r6> || ?R1 = <https://host/test/r7> || ?R1 = <https://host/test/r8> || ?R1 = <https://host/test/r9> || ?R1 = <https://host/test/r10> ) OPTIONAL { { ?R1 pr1:identifier ?R1_v1 } UNION { ?R1 pr2:boundArtifact ?R1_v2 } } } Without the fix, I got the following plan: (prefix ((pr2: <http://foo/baz>) (pr1: <http://foo/bar>)) (project (?R1 ?R1_v1 ?R1_v2) (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?R1 < https://host/test/r1>) (= ?R1 <https://host/test/r2>)) (= ?R1 < https://host/test/r3>)) (= ?R1 <https://host/test/r4>)) (= ?R1 < https://host/test/r5>)) (= ?R1 <https://host/test/r6>)) (= ?R1 < https://host/test/r7>)) (= ?R1 <https://host/test/r8>)) (= ?R1 < https://host/test/r9>)) (= ?R1 <https://host/test/r10>)) (conditional (bgp (triple ?R1 pr1:type pr2:t1)) (union (bgp (triple ?R1 pr1:identifier ?R1_v1)) (bgp (triple ?R1 pr2:boundArtifact ?R1_v2))))))) With the fix, I got the following plan, which is what we expect (the performance difference on a large repo is massive, i.e. a few hundred ms versus 3 minutes !) (prefix ((pr2: <http://foo/baz>) (pr1: <http://foo/bar>)) (project (?R1 ?R1_v1 ?R1_v2) (disjunction (assign ((?R1 <https://host/test/r1>)) (conditional (bgp (triple <https://host/test/r1> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r1> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r1> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r2>)) (conditional (bgp (triple <https://host/test/r2> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r2> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r2> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r3>)) (conditional (bgp (triple <https://host/test/r3> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r3> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r3> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r4>)) (conditional (bgp (triple <https://host/test/r4> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r4> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r4> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r5>)) (conditional (bgp (triple <https://host/test/r5> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r5> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r5> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r6>)) (conditional (bgp (triple <https://host/test/r6> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r6> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r6> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r7>)) (conditional (bgp (triple <https://host/test/r7> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r7> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r7> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r8>)) (conditional (bgp (triple <https://host/test/r8> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r8> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r8> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r9>)) (conditional (bgp (triple <https://host/test/r9> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r9> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r9> pr2:boundArtifact ?R1_v2))))) (assign ((?R1 <https://host/test/r10>)) (conditional (bgp (triple <https://host/test/r10> pr1:type pr2:t1)) (union (bgp (triple <https://host/test/r10> pr1:identifier ?R1_v1)) (bgp (triple <https://host/test/r10> pr2:boundArtifact ?R1_v2)))))))) Let me know if this is a real issue so I can open an issue with the patch so it can be fixed for 2.7.4 thanks Simon