[
https://issues.apache.org/jira/browse/JENA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129869#comment-14129869
]
Rob Vesse edited comment on JENA-779 at 9/11/14 10:51 AM:
----------------------------------------------------------
Ok that makes perfect sense, if there are broader bugs I'd prefer you to spend
some time figuring them out since you are by far the most familiar with that
code
Having the compound form is definitely useful overall but creates complexity if
introduced too soon. Keeping it and having {{TransformExtendCombine}} introduce
it later in the optimisation process seems like the appropriate solution
was (Author: rvesse):
Ok that makes perfect sense, if there are broader bugs I'd prefer you to spend
some time figuring them out since you are by far the most familiar with that
code
Yes the compound form is definitely useful overall but creates complexity if
introduced too soon and having {{TransformExtendCombine}} be able to do the
combining later seems like the appropriate solution
> Filter placement should be able to break up extend
> --------------------------------------------------
>
> Key: JENA-779
> URL: https://issues.apache.org/jira/browse/JENA-779
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ, Optimizer
> Affects Versions: Jena 2.12.0
> Reporter: Rob Vesse
> Priority: Minor
> Attachments: JENA-779-filter-extend-extend,
> JENA-779-single-extend.patch, JENA-779.patch
>
>
> The following query demonstrates a query plan seen internally which is
> considered sub-optimal.
> Consider the following query:
> {noformat}
> SELECT DISTINCT ?domainName
> {
> { ?uri ?p ?o }
> UNION
> {
> ?sub ?p ?uri
> FILTER(isIRI(?uri))
> }
> BIND(str(?uri) as ?s)
> FILTER(STRSTARTS(?s, "http://"))
> BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> }
> {noformat}
> Which ARQ optimises as follows:
> {noformat}
> (distinct
> (project (?domainName)
> (filter (strstarts ?s "http://")
> (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore
> (substr ?s 8) "/")))))
> (union
> (bgp (triple ?uri ?p ?o))
> (filter (isIRI ?uri)
> (bgp (triple ?sub ?p ?uri))))))))
> {noformat}
> Which makes the query engine do a lot of work because it computes the both
> the {{BIND}} expressions for lots of possible solutions that will then be
> rejected when for many of them it would only be necessary to compute the
> first simple {{BIND}} function.
> It would be better if the query was planned as follows:
> {noformat}
> (distinct
> (project (?domainName)
> (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8)
> "/"))))
> (filter (strstarts ?s "http://")
> (extend (?s (str ?uri))
> (union
> (bgp (triple ?uri ?p ?o))
> (filter (isIRI ?uri)
> (bgp (triple ?sub ?p ?uri)))))))))
> {noformat}
> Essentially when we try to push a filter through an {{extend}} if we
> determine that we cannot push it through the extend we should see if we can
> split the {{extend}} instead thus resulting in a partial pushing.
> Note that a user can re-write the original query to yield this plan if they
> make the second {{BIND}} a project expression like so:
> {noformat}
> SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS
> ?domainName)
> {
> { ?uri ?p ?o }
> UNION
> {
> ?sub ?p ?uri
> FILTER(isIRI(?uri))
> }
> BIND(str(?uri) as ?s)
> FILTER(STRSTARTS(?s, "http://"))
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)