Rob Vesse created JENA-779:
------------------------------

             Summary: Filter placement should be able to break up extend
                 Key: JENA-779
                 URL: https://issues.apache.org/jira/browse/JENA-779
             Project: Apache Jena
          Issue Type: Improvement
          Components: ARQ, Optimizer
    Affects Versions: Jena 2.12.0
            Reporter: Rob Vesse
            Priority: Minor


The following query demonstrates a query plan seen internally which is 
considered sub-optimal.

Consider the following query:

{noformat}
SELECT DISTINCT ?domainName
{
  { ?uri ?p ?o }
  UNION
  {
    ?sub ?p ?uri
    FILTER(isIRI(?uri))
  }
  BIND(str(?uri) as ?s)
  FILTER(STRSTARTS(?s, "http://";))
  BIND(IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
}
{noformat}

Which ARQ optimises as follows:

{noformat}
(distinct
  (project (?domainName)
    (filter (strstarts ?s "http://";)
      (extend ((?s (str ?uri)) (?domainName (iri (concat "http://"; (strbefore 
(substr ?s 8) "/")))))
        (union
          (bgp (triple ?uri ?p ?o))
          (filter (isIRI ?uri)
            (bgp (triple ?sub ?p ?uri))))))))
{noformat}

Which makes the query engine do a lot of work because it computes the both the 
{{BIND}} expressions for lots of possible solutions that will then be rejected 
when for many of them it would only be necessary to compute the first simple 
{{BIND}} function.

It would be better if the query was planned as follows:

{noformat}
(distinct
  (project (?domainName)
    (extend (?domainName (iri (concat "http://"; (strbefore (substr ?s 8) "/"))))
      (filter (strstarts ?s "http://";)
        (extend (?s (str ?uri))
          (union
            (bgp (triple ?uri ?p ?o))
            (filter (isIRI ?uri)
              (bgp (triple ?sub ?p ?uri)))))))))
{noformat}

Essentially when we try to push a filter through an {{extend}} if we determine 
that we cannot push it through the extend we should see if we can split the 
{{extend}} instead thus resulting in a partial pushing.

Note that a user can re-write the original query to yield this plan if they 
make the second {{BIND}} a project expression like so:

{noformat}
SELECT DISTINCT (IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS 
?domainName)
{
  { ?uri ?p ?o }
  UNION
  {
    ?sub ?p ?uri
    FILTER(isIRI(?uri))
  }
  BIND(str(?uri) as ?s)
  FILTER(STRSTARTS(?s, "http://";))
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to