Alexis Schlomer created SPARK-55322:
---------------------------------------

             Summary: Add Overload for MaxBy / MinBy with k > 1
                 Key: SPARK-55322
                 URL: https://issues.apache.org/jira/browse/SPARK-55322
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Alexis Schlomer


Adds an optional *k* parameter to {{max_by}} / {{min_by}} to return top-K (or 
bottom-K) values per group, enabling concise, intent-clear queries and native 
window-function usage. This replaces complex CTE and ranking patterns with a 
single aggregation that returns an array of up to _k_ values of {{expr1}} 
ordered by {{expr2}} (ties non-deterministic; NULL order keys excluded). The 
implementation uses a bounded priority queue during aggregation, avoiding full 
sorts and large materialization overhead, and aligns Spark with functionality 
already available in engines like Snowflake, DuckDB, and Trino.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to