[
https://issues.apache.org/jira/browse/JENA-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rob Vesse updated JENA-473:
---------------------------
Attachment: impl-join-opt-linearized.csv
With the relaxing of the linearization restriction for left join we can now get
much better performance for the implicit left join case:
With Optimization - 0.1s to First Result, 1.2s to All Results
> ARQ should be able to optimize implicit joins and implicit left joins
> ---------------------------------------------------------------------
>
> Key: JENA-473
> URL: https://issues.apache.org/jira/browse/JENA-473
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Rob Vesse
> Assignee: Rob Vesse
> Labels: optimization, sparql
> Fix For: Jena 2.10.2
>
> Attachments: impl-join.csv, impl-join-opt.csv,
> impl-join-opt-linearized.csv
>
>
> There is a class of useful optimizations that currently ARQ does not even
> attempt to apply which are usually referred to as implicit joins.
> A trivial example is as follows:
> SELECT *
> WHERE
> {
> ?x ?p1 ?o1 .
> ?y ?p2 ?o2 .
> FILTER(?x = ?y)
> }
> Currently this requires us to compute a cross product and then apply the
> filter, even with streaming evaluation this can be extremely costly. The aim
> of this optimization is to produce a query like the following:
> SELECT *
> WHERE
> {
> ?x ?p1 ?o1 .
> ?x ?p2 ?o2 .
> BIND(?x AS ?y)
> }
> This optimization can also be applied to some left joins where the implicit
> join applies across the join e.g.
> SELECT *
> WHERE
> {
> ?x ?p1 ?o1 .
> OPTIONAL
> {
> ?y ?p2 ?o2 .
> FILTER(?x = ?y)
> }
> }
> This can be thought of as a generalization of TransformFilterEquality except
> covering the case where both items are variables. Since both things are
> variables we need to be careful about when we apply this optimization since
> when = is used we need to guarantee that substituting one variable for the
> other does not alter the semantics of the query.
> I believe the optimization is safe to apply providing that we can guarantee
> (as far as possible) that one variable is non-literal. This can be done by
> inspecting the positions in which the mentioned variables are used and
> ensuring that at least one of the variables occurs in the graph, subject or
> predicate position.
> Safety for left joins is a little more complex since we must ensure that at
> least one of the variables occurs in the RHS and we can only make the
> substitution in the RHS as otherwise we change the join semantics.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira