[
https://issues.apache.org/jira/browse/SPARK-57851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
L. C. Hsieh updated SPARK-57851:
--------------------------------
Description:
Part of the SPIP umbrella SPARK-56978 (Faster queries in local laptop mode),
covering the third category: shuffle-free local execution for small queries.
This adds a conservative optimizer rule (MarkSingleTaskExecution) that marks
small single-partition scans, optionally with a shuffle-inducing operator on
top (sort, aggregate, distinct, window, limit/offset, expand), as candidates
for single-task execution. Such a scan reports a SinglePartition output
partitioning, allowing EnsureRequirements to elide the shuffle that would
otherwise be inserted before the operator on top. The shuffle is not required
for correctness, so removing it reduces scheduling overhead for small,
low-latency queries.
The feature is controlled by a new set of internal configs under
spark.sql.optimizer.singleTaskExecution.* and is disabled by default.
This initial change covers the core operator set; join is intentionally left
out and can be added as a follow-up. Union is already covered by the existing
spark.sql.unionOutputPartitioning.
> Shuffle-free single-task execution for small queries
> ----------------------------------------------------
>
> Key: SPARK-57851
> URL: https://issues.apache.org/jira/browse/SPARK-57851
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: L. C. Hsieh
> Priority: Major
>
> Part of the SPIP umbrella SPARK-56978 (Faster queries in local laptop mode),
> covering the third category: shuffle-free local execution for small queries.
> This adds a conservative optimizer rule (MarkSingleTaskExecution) that marks
> small single-partition scans, optionally with a shuffle-inducing operator on
> top (sort, aggregate, distinct, window, limit/offset, expand), as candidates
> for single-task execution. Such a scan reports a SinglePartition output
> partitioning, allowing EnsureRequirements to elide the shuffle that would
> otherwise be inserted before the operator on top. The shuffle is not required
> for correctness, so removing it reduces scheduling overhead for small,
> low-latency queries.
> The feature is controlled by a new set of internal configs under
> spark.sql.optimizer.singleTaskExecution.* and is disabled by default.
> This initial change covers the core operator set; join is intentionally left
> out and can be added as a follow-up. Union is already covered by the existing
> spark.sql.unionOutputPartitioning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]