[ 
https://issues.apache.org/jira/browse/JENA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126887#comment-14126887
 ] 

Rob Vesse edited comment on JENA-779 at 9/9/14 11:20 AM:
---------------------------------------------------------

Though the first {{BIND}} ends the group graph pattern it means that the 
algebra for the first part of the group will have an outermost operator of 
{{extend}}.  ARQ then processes the next item in the group which is another 
{{ElementBind}} so it calls {{OpExtend.extend()}} which then combines them 
together into a single extend operator.  Finally it adds the filters for the 
group hence why we get the basic plan we see.

This plan is perfectly valid according to the SPARQL specification AFAIK


was (Author: rvesse):
Though the first {{BIND}} ends the group graph pattern it means that the 
algebra for the first group will have an outermost operator of {{extend}}.  
Since ARQ then processes the next item in the group which is another 
{{ElementBind}} it calls {{OpExtend.extend()}} which then combines them 
legitimately.  Finally it adds the filters for the group hence why we get the 
basic plan we see.

This plan is perfectly valid according to the SPARQL specification AFAIK

> Filter placement should be able to break up extend
> --------------------------------------------------
>
>                 Key: JENA-779
>                 URL: https://issues.apache.org/jira/browse/JENA-779
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.12.0
>            Reporter: Rob Vesse
>            Priority: Minor
>         Attachments: JENA-779.patch
>
>
> The following query demonstrates a query plan seen internally which is 
> considered sub-optimal.
> Consider the following query:
> {noformat}
> SELECT DISTINCT ?domainName
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://";))
>   BIND(IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> }
> {noformat}
> Which ARQ optimises as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (filter (strstarts ?s "http://";)
>       (extend ((?s (str ?uri)) (?domainName (iri (concat "http://"; (strbefore 
> (substr ?s 8) "/")))))
>         (union
>           (bgp (triple ?uri ?p ?o))
>           (filter (isIRI ?uri)
>             (bgp (triple ?sub ?p ?uri))))))))
> {noformat}
> Which makes the query engine do a lot of work because it computes the both 
> the {{BIND}} expressions for lots of possible solutions that will then be 
> rejected when for many of them it would only be necessary to compute the 
> first simple {{BIND}} function.
> It would be better if the query was planned as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (extend (?domainName (iri (concat "http://"; (strbefore (substr ?s 8) 
> "/"))))
>       (filter (strstarts ?s "http://";)
>         (extend (?s (str ?uri))
>           (union
>             (bgp (triple ?uri ?p ?o))
>             (filter (isIRI ?uri)
>               (bgp (triple ?sub ?p ?uri)))))))))
> {noformat}
> Essentially when we try to push a filter through an {{extend}} if we 
> determine that we cannot push it through the extend we should see if we can 
> split the {{extend}} instead thus resulting in a partial pushing.
> Note that a user can re-write the original query to yield this plan if they 
> make the second {{BIND}} a project expression like so:
> {noformat}
> SELECT DISTINCT (IRI(CONCAT("http://";, STRBEFORE(SUBSTR(?s,8), "/"))) AS 
> ?domainName)
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://";))
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to