[
https://issues.apache.org/jira/browse/CALCITE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17353842#comment-17353842
]
Sylvain Crozon commented on CALCITE-4617:
-----------------------------------------
[~julianhyde] I believe the right thing to do would be to not push the offset
or limit past the join. There is no way to know what rows from the left side of
the join will match rows from the right side and so we don't know whether it's
safe to skip an entire row on the right side or not.
But not pushing the limit inside the join would mean we'd retrieve more than we
need. Intuitively I think we could push {{fetch=sort.limit+sort.fetch}} to make
sure we fetch enough to account for both once the datasets are merged.
Something like:
{code:java}
Sort(offset=1, limit=10)
Join
Scan
Sort(limit=11)
Scan{code}
I updated the PR with this logic and added a test case with the example you
suggested.
> Wrong offset when SortJoinTransposeRule pushes a sort node with an offset
> -------------------------------------------------------------------------
>
> Key: CALCITE-4617
> URL: https://issues.apache.org/jira/browse/CALCITE-4617
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.26.0
> Reporter: Sylvain Crozon
> Assignee: Ruben Q L
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.27.0
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> The SortJoinTransposeRule will push a sort node past a join, and then
> duplicate the sort node on top of the join. When the sort node has an offset,
> we should only apply it once, otherwise we end up skipping twice as many rows
> as we should. The sort node added on top of the join should have a null
> offset.
>
> For example the testSortJoinTranspose6 test checks that for this initial plan
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
> LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> the SortJoinTransposeRule should convert to
> {code}
> LogicalProject(DEPTNO=[$0], EMPNO=[$2])
> LogicalSort(offset=[2], fetch=[10])
> LogicalJoin(condition=[=($0, $9)], joinType=[right])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalSort(offset=[2], fetch=[10])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Which will result in applying the offset twice. Instead the LogicalSort on
> top of the join should just have a null offset
--
This message was sent by Atlassian Jira
(v8.3.4#803005)