[
https://issues.apache.org/jira/browse/JENA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129864#comment-14129864
]
Andy Seaborne commented on JENA-779:
------------------------------------
I have looked at it the patch and in doing so I found that
{{processExtendAssign}} is buggy. e.g. if it can not transform the inner op,
it returns null (for no change), forgetting the things it had found it could
push in one level but no further. Also, the calcuation of what can be pushed
is dubious but maked in the test cases by that bug. When I fixed the first
part, it showed up other issues which are dubious at best. I'm wrting the test
cases at the moment and a patch for {{processExtendAssign}} at the moment.
Breaking up {{extend}} can make sense; my preference is to work with uncombined
{{(extend)}} for uniformity then apply {{TransformExtendCombine}} late to bring
together bits that can be aggregated rather than genrate compound ones, and
have special code that has to break them up as an additional case. With
algebra generator now does not generate them from BIND and relies on
{{TransformExtendCombine}}, the last place they can come from is SELECT
expressions.
That leaves reordering {{extends}}. Now it feels like two orthogonal concepts
- pushing filters about on canonical forms (single extends) and reorder the
extends (which itself is tricky as because of dependecies as your patch shows).
Does that seem like a good plan? If so, I can check in the change for SELECT
expressions and we can see if it makes things simpler or not. This does not
alter the problems in {{processExtendAssign}}, just makes the input algebra
more uniform shape.
The change is in AlegbraGenerator, line 583:
{noformat}
// ---- Assignments from SELECT and other places (so available to ORDER and
HAVING)
if ( ! exprs.isEmpty() ) {
for ( Var v : exprs.getVars() ) {
Expr e = exprs.getExpr(v) ;
op = OpExtend.create(op, v, e) ;
}
}
{noformat}
where it was
{noformat}
// // Potential rewrites based of assign introducing aliases.
// op = OpExtend.create(op, exprs) ;
{noformat}
> Filter placement should be able to break up extend
> --------------------------------------------------
>
> Key: JENA-779
> URL: https://issues.apache.org/jira/browse/JENA-779
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ, Optimizer
> Affects Versions: Jena 2.12.0
> Reporter: Rob Vesse
> Priority: Minor
> Attachments: JENA-779-filter-extend-extend,
> JENA-779-single-extend.patch, JENA-779.patch
>
>
> The following query demonstrates a query plan seen internally which is
> considered sub-optimal.
> Consider the following query:
> {noformat}
> SELECT DISTINCT ?domainName
> {
> { ?uri ?p ?o }
> UNION
> {
> ?sub ?p ?uri
> FILTER(isIRI(?uri))
> }
> BIND(str(?uri) as ?s)
> FILTER(STRSTARTS(?s, "http://"))
> BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> }
> {noformat}
> Which ARQ optimises as follows:
> {noformat}
> (distinct
> (project (?domainName)
> (filter (strstarts ?s "http://")
> (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore
> (substr ?s 8) "/")))))
> (union
> (bgp (triple ?uri ?p ?o))
> (filter (isIRI ?uri)
> (bgp (triple ?sub ?p ?uri))))))))
> {noformat}
> Which makes the query engine do a lot of work because it computes the both
> the {{BIND}} expressions for lots of possible solutions that will then be
> rejected when for many of them it would only be necessary to compute the
> first simple {{BIND}} function.
> It would be better if the query was planned as follows:
> {noformat}
> (distinct
> (project (?domainName)
> (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8)
> "/"))))
> (filter (strstarts ?s "http://")
> (extend (?s (str ?uri))
> (union
> (bgp (triple ?uri ?p ?o))
> (filter (isIRI ?uri)
> (bgp (triple ?sub ?p ?uri)))))))))
> {noformat}
> Essentially when we try to push a filter through an {{extend}} if we
> determine that we cannot push it through the extend we should see if we can
> split the {{extend}} instead thus resulting in a partial pushing.
> Note that a user can re-write the original query to yield this plan if they
> make the second {{BIND}} a project expression like so:
> {noformat}
> SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS
> ?domainName)
> {
> { ?uri ?p ?o }
> UNION
> {
> ?sub ?p ?uri
> FILTER(isIRI(?uri))
> }
> BIND(str(?uri) as ?s)
> FILTER(STRSTARTS(?s, "http://"))
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)