[
https://issues.apache.org/jira/browse/SPARK-55322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55322:
-----------------------------------
Labels: pull-request-available (was: )
> Add Overload for MaxBy / MinBy with k > 1
> -----------------------------------------
>
> Key: SPARK-55322
> URL: https://issues.apache.org/jira/browse/SPARK-55322
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Alexis Schlomer
> Priority: Major
> Labels: pull-request-available
>
> Adds an optional *k* parameter to {{max_by}} / {{min_by}} to return top-K (or
> bottom-K) values per group, enabling concise, intent-clear queries and native
> window-function usage. This replaces complex CTE and ranking patterns with a
> single aggregation that returns an array of up to _k_ values of {{expr1}}
> ordered by {{expr2}} (ties non-deterministic; NULL order keys excluded). The
> implementation uses a bounded priority queue during aggregation, avoiding
> full sorts and large materialization overhead, and aligns Spark with
> functionality already available in engines like Snowflake, DuckDB, and Trino.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]