[
https://issues.apache.org/jira/browse/CALCITE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840126#comment-16840126
]
Ruben Quesada Lopez commented on CALCITE-2624:
----------------------------------------------
Thanks for your comment, [~hyuan]. I agree that, generally, this rule will not
generate a more optimized plan. But there are some scenarios where it can help.
I will try to summarize one:
{code}
-- select * from books b join authors a on b.authorId = a.id order by b.title
limit 10
Sort(b.title, limit: 10)
Join(condition: b.authorId=a.id)
Scan(book)
Scan(author)
{code}
The idea of this rule is to copy the Sort into the join (in this case into the
left part), without the limit/offset attributes to not impact the final result:
{code}
-- SortJoinCopyRule applied
Sort(b.title, limit: 10)
Join(condition: b.authorId=a.id)
Sort(b.title)
Scan(book)
Scan(author)
{code}
Now, let us say that we can combine Sort+Scan into an "ElasticScan", that
allows us to scan a table already in a given order with a reasonably good
performance. Also, the EnumerableLimitRule will split the original Sort (which
contained a limit) into Limit+Sort:
{code}
-- ElasticScanRule + EnumerableLimitRule applied
Limit(10)
Sort(b.title)
Join(condition: b.authorId=a.id)
ElasticScan(table=book, sort=b.title)
Scan(author)
{code}
And finally, the SortRemoveRule will detect that the sort input (i.e. the Join)
is already sorted by b.title, so the Sort will be removed, leaving us with a
final plan (much more optimized that the original one, where the Sort had to be
made after the Join):
{code}
-- SortRemoveRule applied
Limit(10)
Join(condition: b.authorId=a.id)
ElasticScan(table=book, sort=b.title)
Scan(author)
{code}
> Add a rule to copy a sort below a join operator
> -----------------------------------------------
>
> Key: CALCITE-2624
> URL: https://issues.apache.org/jira/browse/CALCITE-2624
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.17.0
> Reporter: Stamatis Zampetakis
> Assignee: Khawla Mouhoubi
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Currently, the only rule that allows a sort to traverse a binary operator is
> the SortJoinTransposeRule. The rule was introduced mainly to push limits in
> the case of left and right outer joins (see CALCITE-831).
> I assume that the main reason that we don't have more rules is that sorts
> with limits and offsets cannot be pushed safely below many types of join
> operators. However, in many cases, it is possible and beneficial for
> optimization purposes to just push the sort without the limit and offset.
> Since we do not know in advance if the join operator preserves the order we
> cannot remove (that is why I am saying copy and not transpose) the sort
> operator on top of the join. The latter is not really a problem since the
> SortRemoveRule can detect such cases and remove the sort if it is redundant.
> A few concrete examples where this optimization makes sense are outlined
> below:
> * allow the sort to be later absorbed by an index scan and disappear from
> the plan (Sort + Tablescan => IndexScan with RelCollation);
> * allow operators that require sorted inputs to be exploited more easily
> (e.g., merge join);
> * allow the sort to be performed on a possibly smaller result (assuming that
> the physical binary operator that is going to be used preserves the order of
> left/right input and the top sort operator can be removed entirely).
> I propose to add a new rule (e.g., SortCopyBelowJoinRule,
> SortJoinCopyBelowRule) which allows a sort to be copied to the left or right
> (or to both if it is rather easy to decompose the sort) of a join operator
> (excluding the limit and offset attributes) if the respective inputs are not
> already sorted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)