[
https://issues.apache.org/jira/browse/CALCITE-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009439#comment-18009439
]
Zhen Chen commented on CALCITE-7104:
------------------------------------
Thank you [~julianhyde] for your reply. Let me tell you my understanding.
Please correct me if there are any omissions or inaccuracies.
Calcite already supports the elimination of duplicate keys in simple scenarios,
such as "ORDER BY x, x", which is simplified to "ORDER BY x". However, as
mentioned in the summary, cases like declaring two different aliases "d1" and
"d2" for the same column "deptno" and then wrapping them in another SELECT
cannot be directly eliminated during parsing.
I think we can rely on whether there are duplicate expressions in PROJECT(or
possibly in AGGREGATE's aggCall) to determine if there are duplicate columns
(including the expression indices referenced from the input RelNode and the
functions or other operations applied to those expressions).
By matching the SORT-PROJECT pattern, we can determine whether there are truly
identical columns in SORT, thereby eliminating duplicate sort keys in SORT.
Taking it a step further, "ORDER BY d1, d2 + 1"(where "d1" and "d2" is an alias
for deptno) could also be simplified to "ORDER BY d1".
There might be another issue: a SORT node isn't always followed by a PROJECT
containing duplicate expressions. However, the root expressions are indeed the
same—they're just deeply nested. Should we rely on other rules to construct the
SORT-PROJECT pattern for eliminating duplicate sort keys, or should we
recursively check each RelNode to see if identical expressions exist (possibly
terminating the recursion at PROJECT, AGG, or SCAN)?
> Remove duplicate sort keys
> --------------------------
>
> Key: CALCITE-7104
> URL: https://issues.apache.org/jira/browse/CALCITE-7104
> Project: Calcite
> Issue Type: Improvement
> Reporter: weihua zhang
> Priority: Major
> Labels: pull-request-available
>
> {code:sql}
> select d1 from (select deptno as d1, deptno as d2 from dept ) as tmp order
> by d1, d2;
> {code}
> {noformat}
> LogicalProject(D1=[$0]), id = 2284
> LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]), id = 2283
> LogicalProject(D1=[$0], D2=[$0]), id = 2282
> LogicalTableScan(table=[[default, dept]]), id = 2278
> {noformat}
> LogicalSort can change to:
> {noformat}
> LogicalProject(D1=[$0]), id = 2284
> LogicalSort(sort0=[$0], dir0=[ASC]), id = 2283
> LogicalProject(D1=[$0], D2=[$0]), id = 2282
> LogicalTableScan(table=[[default, dept]]), id = 2278
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)