Rob Vesse created JENA-779: ------------------------------ Summary: Filter placement should be able to break up extend Key: JENA-779 URL: https://issues.apache.org/jira/browse/JENA-779 Project: Apache Jena Issue Type: Improvement Components: ARQ, Optimizer Affects Versions: Jena 2.12.0 Reporter: Rob Vesse Priority: Minor
The following query demonstrates a query plan seen internally which is considered sub-optimal. Consider the following query: {noformat} SELECT DISTINCT ?domainName { { ?uri ?p ?o } UNION { ?sub ?p ?uri FILTER(isIRI(?uri)) } BIND(str(?uri) as ?s) FILTER(STRSTARTS(?s, "http://")) BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName) } {noformat} Which ARQ optimises as follows: {noformat} (distinct (project (?domainName) (filter (strstarts ?s "http://") (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/"))))) (union (bgp (triple ?uri ?p ?o)) (filter (isIRI ?uri) (bgp (triple ?sub ?p ?uri)))))))) {noformat} Which makes the query engine do a lot of work because it computes the both the {{BIND}} expressions for lots of possible solutions that will then be rejected when for many of them it would only be necessary to compute the first simple {{BIND}} function. It would be better if the query was planned as follows: {noformat} (distinct (project (?domainName) (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/")))) (filter (strstarts ?s "http://") (extend (?s (str ?uri)) (union (bgp (triple ?uri ?p ?o)) (filter (isIRI ?uri) (bgp (triple ?sub ?p ?uri))))))))) {noformat} Essentially when we try to push a filter through an {{extend}} if we determine that we cannot push it through the extend we should see if we can split the {{extend}} instead thus resulting in a partial pushing. Note that a user can re-write the original query to yield this plan if they make the second {{BIND}} a project expression like so: {noformat} SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName) { { ?uri ?p ?o } UNION { ?sub ?p ?uri FILTER(isIRI(?uri)) } BIND(str(?uri) as ?s) FILTER(STRSTARTS(?s, "http://")) } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)