[ 
https://issues.apache.org/jira/browse/SPARK-57851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-57851:
--------------------------------
    Description: 
Part of the SPIP umbrella SPARK-56978 (Faster queries in local laptop mode), 
covering the third category: shuffle-free local execution for small queries.

This adds a conservative optimizer rule (MarkSingleTaskExecution) that marks 
small single-partition scans, optionally with a shuffle-inducing operator on 
top (sort, aggregate, distinct, window, limit/offset, expand), as candidates 
for single-task execution. Such a scan reports a SinglePartition output 
partitioning, allowing EnsureRequirements to elide the shuffle that would 
otherwise be inserted before the operator on top. The shuffle is not required 
for correctness, so removing it reduces scheduling overhead for small, 
low-latency queries.

The feature is controlled by a new set of internal configs under 
spark.sql.optimizer.singleTaskExecution.* and is disabled by default.

This initial change covers the core operator set; join is intentionally left 
out and can be added as a follow-up. Union is already covered by the existing 
spark.sql.unionOutputPartitioning.

> Shuffle-free single-task execution for small queries
> ----------------------------------------------------
>
>                 Key: SPARK-57851
>                 URL: https://issues.apache.org/jira/browse/SPARK-57851
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: L. C. Hsieh
>            Priority: Major
>
> Part of the SPIP umbrella SPARK-56978 (Faster queries in local laptop mode), 
> covering the third category: shuffle-free local execution for small queries.
> This adds a conservative optimizer rule (MarkSingleTaskExecution) that marks 
> small single-partition scans, optionally with a shuffle-inducing operator on 
> top (sort, aggregate, distinct, window, limit/offset, expand), as candidates 
> for single-task execution. Such a scan reports a SinglePartition output 
> partitioning, allowing EnsureRequirements to elide the shuffle that would 
> otherwise be inserted before the operator on top. The shuffle is not required 
> for correctness, so removing it reduces scheduling overhead for small, 
> low-latency queries.
> The feature is controlled by a new set of internal configs under 
> spark.sql.optimizer.singleTaskExecution.* and is disabled by default.
> This initial change covers the core operator set; join is intentionally left 
> out and can be added as a follow-up. Union is already covered by the existing 
> spark.sql.unionOutputPartitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to