Zikun created SPARK-32096:
-----------------------------
Summary: Support top-N sort for Spark SQL window function
Key: SPARK-32096
URL: https://issues.apache.org/jira/browse/SPARK-32096
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Environment: Any environment that supports Spark.
Reporter: Zikun
Fix For: 3.1.0
In Spark SQL, there are two types of sort execution, *_SortExec_* and
*_TakeOrderedAndProjectExec_* .
*_SortExec_* is a general sorting execution and it does not support top-N sort.
*_TakeOrderedAndProjectExec_* is the execution for top-N sort in Spark.
Spark SQL rank window function needs to sort the data locally and it relies on
the execution plan *_SortExec_* to sort the data in each physical data
partition. When the filter of the window rank (e.g. rank <= 100) is specified
in a user's query, the filter can actually be pushed down to the SortExec and
then we let SortExec operates top-N sort.
Right now SortExec does not support top-N sort and we need to extend the
capability of SortExec to support top-N sort.
Or if SortExec is not considered as the right execution choice, we can create a
new execution plan called topNSortExec to do top-N sort in each local partition
if a filter on the rank is specified.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]