XiDuo You created SPARK-41220:
---------------------------------

             Summary: Range partitioner sample supports column pruning
                 Key: SPARK-41220
                 URL: https://issues.apache.org/jira/browse/SPARK-41220
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: XiDuo You


When do a global sort, firstly we do sample to get range bounds, then we use 
the range partitioner to do shuffle exchange.
The issue is, the sample plan is coupled with the shuffle plan that causes we 
can not optimize the sample plan. What we need for sample plan is the columns 
for sort order but the shuffle plan contains all data columns.So at least, we 
can do column pruning for the sample plan to only fetch the ordering columns.

A common example is: `OPTIMIZE table ZORDER BY columns`





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to