[ 
https://issues.apache.org/jira/browse/CALCITE-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-7517:
------------------------------------
    Labels: pull-request-available  (was: )

> Support DISTINCT ON clause in SELECT statements
> -----------------------------------------------
>
>                 Key: CALCITE-7517
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7517
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Yu Xu
>            Assignee: Yu Xu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> Currently Caclite not support DISTINCT ON syntax, such as:
>  
> {code:java}
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;{code}
>  
> many mainstream databases support this syntax, such as Postgresql ClickHouse:
> PostgreSql: https://www.postgresql.org/docs/18/sql-select.html
> ClickHouse: 
> [https://clickhouse.com/docs/sql-reference/statements/select/distinct]
> Benefits of introducing {{DISTINCT ON}} syntax into Calcite:
> *1. PostgreSQL Compatibility*
> {{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows 
> Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.) 
> to parse and execute SQL written for PostgreSQL without manual rewrite.
> *2.Cleaner Semantics for First-Row-per-Group Queries*
> It solves the common "get the first row per partition" pattern directly and 
> declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose 
> {{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions 
> wrapped in a CTE or sub-query, which is harder to read and maintain.
> *3.Reduced Boilerplate*
> Compare the two approaches:
> {code:java}
> -- With DISTINCT ON (concise, intent is obvious)
> SELECT DISTINCT ON (deptno) empno, ename
> FROM emp
> ORDER BY deptno, hiredate DESC;
> -- Without it (verbose, error-prone)
> WITH ranked AS (
>   SELECT empno, ename,
>     ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
>   FROM emp
> )
> SELECT empno, ename FROM ranked WHERE rn = 1; {code}
> *4.Better Integration with ORDER BY and LIMIT*
> Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it 
> only deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} 
> behave naturally after the deduplication step. The optimizer can also reason 
> about the sort order more easily than with an opaque window-filter pattern.
> *5. Easier for Query Builders and BI Tools*
> Many tools generate SQL programmatically. Having a single clause for "keep 
> the best match per group" simplifies code generation and avoids the 
> complexity of synthesizing window functions and outer filters.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to