Rob Vesse created JENA-779:
------------------------------
Summary: Filter placement should be able to break up extend
Key: JENA-779
URL: https://issues.apache.org/jira/browse/JENA-779
Project: Apache Jena
Issue Type: Improvement
Components: ARQ, Optimizer
Affects Versions: Jena 2.12.0
Reporter: Rob Vesse
Priority: Minor
The following query demonstrates a query plan seen internally which is
considered sub-optimal.
Consider the following query:
{noformat}
SELECT DISTINCT ?domainName
{
{ ?uri ?p ?o }
UNION
{
?sub ?p ?uri
FILTER(isIRI(?uri))
}
BIND(str(?uri) as ?s)
FILTER(STRSTARTS(?s, "http://"))
BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
}
{noformat}
Which ARQ optimises as follows:
{noformat}
(distinct
(project (?domainName)
(filter (strstarts ?s "http://")
(extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore
(substr ?s 8) "/")))))
(union
(bgp (triple ?uri ?p ?o))
(filter (isIRI ?uri)
(bgp (triple ?sub ?p ?uri))))))))
{noformat}
Which makes the query engine do a lot of work because it computes the both the
{{BIND}} expressions for lots of possible solutions that will then be rejected
when for many of them it would only be necessary to compute the first simple
{{BIND}} function.
It would be better if the query was planned as follows:
{noformat}
(distinct
(project (?domainName)
(extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/"))))
(filter (strstarts ?s "http://")
(extend (?s (str ?uri))
(union
(bgp (triple ?uri ?p ?o))
(filter (isIRI ?uri)
(bgp (triple ?sub ?p ?uri)))))))))
{noformat}
Essentially when we try to push a filter through an {{extend}} if we determine
that we cannot push it through the extend we should see if we can split the
{{extend}} instead thus resulting in a partial pushing.
Note that a user can re-write the original query to yield this plan if they
make the second {{BIND}} a project expression like so:
{noformat}
SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS
?domainName)
{
{ ?uri ?p ?o }
UNION
{
?sub ?p ?uri
FILTER(isIRI(?uri))
}
BIND(str(?uri) as ?s)
FILTER(STRSTARTS(?s, "http://"))
}
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)