hi guys,

so, I patched Jena 2.7.1 with the fix for 
https://issues.apache.org/jira/browse/JENA-283 and I was running again 
into query optimization regressions. So, I started debugging this and the 
problem we faced was caused by the occurrence of a UNION inside an 
OPTIONAL which blocked the filter from being pushed down because the code 
does not consider OpUnion operators and thus assumes it cannot optimize as 
soon as it finds this kind of operator

The way to fix this seems to be to change line 137 (on HEAD)

if ( op instanceof OpConditional || op instanceof OpLeftJoin)

into 

if ( op instanceof OpConditional || op instanceof OpLeftJoin || op 
instanceof OpUnion)

but although I think this is safe, I would like to know from someone more 
experienced if this is really safe and wouldn't change the query 
semantics. More generally, I wonder why this particular condition cannot 
be changed to (op instanceof Op2)

Here is a sample query which caused a problem:

PREFIX pr1: <http://foo/bar>
PREFIX pr2: <http://foo/baz>
SELECT ?R1 ?R1_v1 ?R1_v2
WHERE
{ ?R1 pr1:type pr2:t1
FILTER ( ?R1 = <https://host/test/r1>  || 
         ?R1 = <https://host/test/r2>   || 
                 ?R1 = <https://host/test/r3>  || 
                 ?R1 = <https://host/test/r4>  || 
                 ?R1 = <https://host/test/r5>  || 
                 ?R1 = <https://host/test/r6>  || 
                 ?R1 = <https://host/test/r7>  || 
                 ?R1 = <https://host/test/r8>  || 
                 ?R1 = <https://host/test/r9>  || 
                 ?R1 = <https://host/test/r10>  )
OPTIONAL
{ { ?R1 pr1:identifier ?R1_v1 }
UNION
{ ?R1 pr2:boundArtifact ?R1_v2
}
}
}

Without the fix, I got the following plan:

(prefix ((pr2: <http://foo/baz>)
         (pr1: <http://foo/bar>))
  (project (?R1 ?R1_v1 ?R1_v2)
    (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?R1 <
https://host/test/r1>) (= ?R1 <https://host/test/r2>)) (= ?R1 <
https://host/test/r3>)) (= ?R1 <https://host/test/r4>)) (= ?R1 <
https://host/test/r5>)) (= ?R1 <https://host/test/r6>)) (= ?R1 <
https://host/test/r7>)) (= ?R1 <https://host/test/r8>)) (= ?R1 <
https://host/test/r9>)) (= ?R1 <https://host/test/r10>))
      (conditional
        (bgp (triple ?R1 pr1:type pr2:t1))
        (union
          (bgp (triple ?R1 pr1:identifier ?R1_v1))
          (bgp (triple ?R1 pr2:boundArtifact ?R1_v2)))))))

With the fix, I got the following plan, which is what we expect (the 
performance difference on a large repo is massive, i.e. a few hundred ms 
versus 3 minutes !)

(prefix ((pr2: <http://foo/baz>)
         (pr1: <http://foo/bar>))
  (project (?R1 ?R1_v1 ?R1_v2)
    (disjunction
      (assign ((?R1 <https://host/test/r1>))
        (conditional
          (bgp (triple <https://host/test/r1> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r1> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r1> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r2>))
        (conditional
          (bgp (triple <https://host/test/r2> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r2> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r2> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r3>))
        (conditional
          (bgp (triple <https://host/test/r3> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r3> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r3> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r4>))
        (conditional
          (bgp (triple <https://host/test/r4> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r4> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r4> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r5>))
        (conditional
          (bgp (triple <https://host/test/r5> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r5> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r5> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r6>))
        (conditional
          (bgp (triple <https://host/test/r6> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r6> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r6> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r7>))
        (conditional
          (bgp (triple <https://host/test/r7> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r7> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r7> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r8>))
        (conditional
          (bgp (triple <https://host/test/r8> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r8> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r8> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r9>))
        (conditional
          (bgp (triple <https://host/test/r9> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r9> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r9> pr2:boundArtifact 
?R1_v2)))))
      (assign ((?R1 <https://host/test/r10>))
        (conditional
          (bgp (triple <https://host/test/r10> pr1:type pr2:t1))
          (union
            (bgp (triple <https://host/test/r10> pr1:identifier ?R1_v1))
            (bgp (triple <https://host/test/r10> pr2:boundArtifact 
?R1_v2))))))))

Let me know if this is a real issue so I can open an issue with the patch 
so it can be fixed for 2.7.4

thanks

Simon

Reply via email to