[jira] [Commented] (CALCITE-7104) Remove duplicate sort keys

Zhen Chen (Jira) Wed, 23 Jul 2025 20:20:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009439#comment-18009439
 ]


Zhen Chen commented on CALCITE-7104:
------------------------------------

Thank you [~julianhyde] for your reply. Let me tell you my understanding. 
Please correct me if there are any omissions or inaccuracies.

Calcite already supports the elimination of duplicate keys in simple scenarios, 
such as "ORDER BY x, x", which is simplified to "ORDER BY x". However, as 
mentioned in the summary, cases like declaring two different aliases "d1" and 
"d2" for the same column "deptno" and then wrapping them in another SELECT 
cannot be directly eliminated during parsing.

I think we can rely on whether there are duplicate expressions in PROJECT(or 
possibly in AGGREGATE's aggCall) to determine if there are duplicate columns 
(including the expression indices referenced from the input RelNode and the 
functions or other operations applied to those expressions).

By matching the SORT-PROJECT pattern, we can determine whether there are truly 
identical columns in SORT, thereby eliminating duplicate sort keys in SORT. 
Taking it a step further, "ORDER BY d1, d2 + 1"(where "d1" and "d2" is an alias 
for deptno) could also be simplified to "ORDER BY d1".

There might be another issue: a SORT node isn't always followed by a PROJECT 
containing duplicate expressions. However, the root expressions are indeed the 
same—they're just deeply nested. Should we rely on other rules to construct the 
SORT-PROJECT pattern for eliminating duplicate sort keys, or should we 
recursively check each RelNode to see if identical expressions exist (possibly 
terminating the recursion at PROJECT, AGG, or SCAN)?

> Remove duplicate sort keys
> --------------------------
>
>                 Key: CALCITE-7104
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7104
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: weihua zhang
>            Priority: Major
>              Labels: pull-request-available
>
> {code:sql}
> select d1 from  (select deptno as d1, deptno as d2 from dept ) as tmp order 
> by d1, d2;
> {code}
> {noformat}
> LogicalProject(D1=[$0]), id = 2284
>   LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]), id = 2283
>     LogicalProject(D1=[$0], D2=[$0]), id = 2282
>       LogicalTableScan(table=[[default, dept]]), id = 2278
> {noformat}
> LogicalSort can change to:
> {noformat}
> LogicalProject(D1=[$0]), id = 2284
>   LogicalSort(sort0=[$0], dir0=[ASC]), id = 2283
>     LogicalProject(D1=[$0], D2=[$0]), id = 2282
>       LogicalTableScan(table=[[default, dept]]), id = 2278
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7104) Remove duplicate sort keys

Reply via email to