Alexis Schlomer created SPARK-55322:
---------------------------------------
Summary: Add Overload for MaxBy / MinBy with k > 1
Key: SPARK-55322
URL: https://issues.apache.org/jira/browse/SPARK-55322
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 4.2.0
Reporter: Alexis Schlomer
Adds an optional *k* parameter to {{max_by}} / {{min_by}} to return top-K (or
bottom-K) values per group, enabling concise, intent-clear queries and native
window-function usage. This replaces complex CTE and ranking patterns with a
single aggregation that returns an array of up to _k_ values of {{expr1}}
ordered by {{expr2}} (ties non-deterministic; NULL order keys excluded). The
implementation uses a bounded priority queue during aggregation, avoiding full
sorts and large materialization overhead, and aligns Spark with functionality
already available in engines like Snowflake, DuckDB, and Trino.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]