[ 
https://issues.apache.org/jira/browse/CALCITE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840126#comment-16840126
 ] 

Ruben Quesada Lopez commented on CALCITE-2624:
----------------------------------------------

Thanks for your comment, [~hyuan]. I agree that, generally, this rule will not 
generate a more optimized plan. But there are some scenarios where it can help. 
I will try to summarize one:
{code}
-- select * from books b join authors a on b.authorId = a.id order by b.title 
limit 10
Sort(b.title, limit: 10)
  Join(condition: b.authorId=a.id)
    Scan(book)
    Scan(author)
{code}

The idea of this rule is to copy the Sort into the join (in this case into the 
left part), without the limit/offset attributes to not impact the final result:
{code}
-- SortJoinCopyRule applied
Sort(b.title, limit: 10)
  Join(condition: b.authorId=a.id)
    Sort(b.title)
      Scan(book)
    Scan(author)
{code}

Now, let us say that we can combine Sort+Scan into an "ElasticScan", that 
allows us to scan a table already in a given order with a reasonably good 
performance. Also, the EnumerableLimitRule will split the original Sort (which 
contained a limit) into Limit+Sort:
{code}
-- ElasticScanRule + EnumerableLimitRule applied
Limit(10)
  Sort(b.title)
    Join(condition: b.authorId=a.id)
      ElasticScan(table=book, sort=b.title)
      Scan(author)
{code}

And finally, the SortRemoveRule will detect that the sort input (i.e. the Join) 
is already sorted by b.title, so the Sort will be removed, leaving us with a 
final plan (much more optimized that the original one, where the Sort had to be 
made after the Join):
{code}
-- SortRemoveRule applied
Limit(10)
  Join(condition: b.authorId=a.id)
    ElasticScan(table=book, sort=b.title)
    Scan(author)
{code}


> Add a rule to copy a sort below a join operator
> -----------------------------------------------
>
>                 Key: CALCITE-2624
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2624
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.17.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Khawla Mouhoubi
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the only rule that allows a sort to traverse a binary operator is 
> the SortJoinTransposeRule. The rule was introduced mainly to push limits in 
> the case of left and right outer joins (see CALCITE-831).
> I assume that the main reason that we don't have more rules is that sorts 
> with limits and offsets cannot be pushed safely below many types of join 
> operators. However, in many cases, it is possible and beneficial for 
> optimization purposes to just push the sort without the limit and offset. 
> Since we do not know in advance if the join operator preserves the order we 
> cannot remove (that is why I am saying copy and not transpose) the sort 
> operator on top of the join. The latter is not really a problem since the 
> SortRemoveRule can detect such cases and remove the sort if it is redundant.
> A few concrete examples where this optimization makes sense are outlined 
> below:
>  * allow the sort to be later absorbed by an index scan and disappear from 
> the plan (Sort + Tablescan => IndexScan with RelCollation);
>  * allow operators that require sorted inputs to be exploited more easily 
> (e.g., merge join);
>  * allow the sort to be performed on a possibly smaller result (assuming that 
> the physical binary operator that is going to be used preserves the order of 
> left/right input and the top sort operator can be removed entirely).
> I propose to add a new rule (e.g., SortCopyBelowJoinRule, 
> SortJoinCopyBelowRule) which allows a sort to be copied to the left or right 
> (or to both if it is rather easy to decompose the sort) of a join operator 
> (excluding the limit and offset attributes) if the respective inputs are not 
> already sorted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to