XiDuo You created SPARK-41220:
---------------------------------
Summary: Range partitioner sample supports column pruning
Key: SPARK-41220
URL: https://issues.apache.org/jira/browse/SPARK-41220
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You
When do a global sort, firstly we do sample to get range bounds, then we use
the range partitioner to do shuffle exchange.
The issue is, the sample plan is coupled with the shuffle plan that causes we
can not optimize the sample plan. What we need for sample plan is the columns
for sort order but the shuffle plan contains all data columns.So at least, we
can do column pruning for the sample plan to only fetch the ordering columns.
A common example is: `OPTIMIZE table ZORDER BY columns`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]