[ 
https://issues.apache.org/jira/browse/SPARK-55322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55322:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add Overload for MaxBy / MinBy with k > 1
> -----------------------------------------
>
>                 Key: SPARK-55322
>                 URL: https://issues.apache.org/jira/browse/SPARK-55322
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Alexis Schlomer
>            Priority: Major
>              Labels: pull-request-available
>
> Adds an optional *k* parameter to {{max_by}} / {{min_by}} to return top-K (or 
> bottom-K) values per group, enabling concise, intent-clear queries and native 
> window-function usage. This replaces complex CTE and ranking patterns with a 
> single aggregation that returns an array of up to _k_ values of {{expr1}} 
> ordered by {{expr2}} (ties non-deterministic; NULL order keys excluded). The 
> implementation uses a bounded priority queue during aggregation, avoiding 
> full sorts and large materialization overhead, and aligns Spark with 
> functionality already available in engines like Snowflake, DuckDB, and Trino.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to