[jira] [Updated] (JENA-473) ARQ should be able to optimize implicit joins and implicit left joins

Rob Vesse (JIRA) Thu, 20 Jun 2013 14:58:15 -0700

     [ 
https://issues.apache.org/jira/browse/JENA-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rob Vesse updated JENA-473:
---------------------------

    Attachment: impl-join.csv
                impl-join-opt.csv

Example benchmarking results for running the two example queries given using 
TDB with 4GB RAM and SP2B 10k as the dataset

For the implicit join case results are impressive:

Without Optimization - 8s to First Result, 78s to All Results
With Optimization - 0.1s to First Result, 1.2s to All Results

For the implicit left join case currently there is minimal difference (if 
anything a little slow down):

Without Optimization - 6s to First Result, 65s to All Results
With Optimization - 7s to First Result, 70s to All Results

The difference in the implicit left join case appears to be down to the fact 
that although the optimizer makes an appropriate substitution the query does 
not benefit from it because the use of assign blocks transforming the leftjoin 
to a conditional which is required if we are to get any benefit out of this 
optimization.

I am going to play with TransformJoinStrategy next to see if I can improve it 
to detect cases where streaming through an assign is safe.
                
> ARQ should be able to optimize implicit joins and implicit left joins
> ---------------------------------------------------------------------
>
>                 Key: JENA-473
>                 URL: https://issues.apache.org/jira/browse/JENA-473
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>              Labels: optimization, sparql
>             Fix For: Jena 2.10.2
>
>         Attachments: impl-join.csv, impl-join-opt.csv
>
>
> There is a class of useful optimizations that currently ARQ does not even 
> attempt to apply which are usually referred to as implicit joins.
> A trivial example is as follows:
> SELECT *
> WHERE
> {
>   ?x ?p1 ?o1 .
>   ?y ?p2 ?o2 .
>   FILTER(?x = ?y)
> }
> Currently this requires us to compute a cross product and then apply the 
> filter, even with streaming evaluation this can be extremely costly.  The aim 
> of this optimization is to produce a query like the following:
> SELECT *
> WHERE
> {
>   ?x ?p1 ?o1 .
>   ?x ?p2 ?o2 .
>   BIND(?x AS ?y)
> }
> This optimization can also be applied to some left joins where the implicit 
> join applies across the join e.g.
> SELECT *
> WHERE
> {
>   ?x ?p1 ?o1 .
>   OPTIONAL
>   {
>     ?y ?p2 ?o2 .
>     FILTER(?x = ?y)
>   }
> }
> This can be thought of as a generalization of TransformFilterEquality except 
> covering the case where both items are variables.  Since both things are 
> variables we need to be careful about when we apply this optimization since 
> when = is used we need to guarantee that substituting one variable for the 
> other does not alter the semantics of the query.
> I believe the optimization is safe to apply providing that we can guarantee 
> (as far as possible) that one variable is non-literal.  This can be done by 
> inspecting the positions in which the mentioned variables are used and 
> ensuring that at least one of the variables occurs in the graph, subject or 
> predicate position.
> Safety for left joins is a little more complex since we must ensure that at 
> least one of the variables occurs in the RHS and we can only make the 
> substitution in the RHS as otherwise we change the join semantics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-473) ARQ should be able to optimize implicit joins and implicit left joins

Reply via email to