[ 
https://issues.apache.org/jira/browse/JENA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129869#comment-14129869
 ] 

Rob Vesse edited comment on JENA-779 at 9/11/14 10:51 AM:
----------------------------------------------------------

Ok that makes perfect sense, if there are broader bugs I'd prefer you to spend 
some time figuring them out since you are by far the most familiar with that 
code

Having the compound form is definitely useful overall but creates complexity if 
introduced too soon. Keeping it and having {{TransformExtendCombine}} introduce 
it later in the optimisation process seems like the appropriate solution


was (Author: rvesse):
Ok that makes perfect sense, if there are broader bugs I'd prefer you to spend 
some time figuring them out since you are by far the most familiar with that 
code

Yes the compound form is definitely useful overall but creates complexity if 
introduced too soon and having {{TransformExtendCombine}} be able to do the 
combining later seems like the appropriate solution

> Filter placement should be able to break up extend
> --------------------------------------------------
>
>                 Key: JENA-779
>                 URL: https://issues.apache.org/jira/browse/JENA-779
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.12.0
>            Reporter: Rob Vesse
>            Priority: Minor
>         Attachments: JENA-779-filter-extend-extend, 
> JENA-779-single-extend.patch, JENA-779.patch
>
>
> The following query demonstrates a query plan seen internally which is 
> considered sub-optimal.
> Consider the following query:
> {noformat}
> SELECT DISTINCT ?domainName
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://";))
>   BIND(IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> }
> {noformat}
> Which ARQ optimises as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (filter (strstarts ?s "http://";)
>       (extend ((?s (str ?uri)) (?domainName (iri (concat "http://"; (strbefore 
> (substr ?s 8) "/")))))
>         (union
>           (bgp (triple ?uri ?p ?o))
>           (filter (isIRI ?uri)
>             (bgp (triple ?sub ?p ?uri))))))))
> {noformat}
> Which makes the query engine do a lot of work because it computes the both 
> the {{BIND}} expressions for lots of possible solutions that will then be 
> rejected when for many of them it would only be necessary to compute the 
> first simple {{BIND}} function.
> It would be better if the query was planned as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (extend (?domainName (iri (concat "http://"; (strbefore (substr ?s 8) 
> "/"))))
>       (filter (strstarts ?s "http://";)
>         (extend (?s (str ?uri))
>           (union
>             (bgp (triple ?uri ?p ?o))
>             (filter (isIRI ?uri)
>               (bgp (triple ?sub ?p ?uri)))))))))
> {noformat}
> Essentially when we try to push a filter through an {{extend}} if we 
> determine that we cannot push it through the extend we should see if we can 
> split the {{extend}} instead thus resulting in a partial pushing.
> Note that a user can re-write the original query to yield this plan if they 
> make the second {{BIND}} a project expression like so:
> {noformat}
> SELECT DISTINCT (IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS 
> ?domainName)
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://";))
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to