[
https://issues.apache.org/jira/browse/SPARK-57194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emily Sun updated SPARK-57194:
------------------------------
Description:
h2. Problem
Custom optimizer rules injected via *SparkSessionExtensions.injectOptimizerRule*
run inside the fixed-point {*}Operator Optimization batch{*}, alongside built-in
rewriters like {_}FoldablePropagation, ConstantFolding, and
PushDownPredicates{_}{*}.{*}
A custom rule that needs to observe the original plan shape (e.g. cross-side
join
predicates before they are folded into single-side constants) can be silently
defeated when a built-in rule transforms the plan first within the same fixed
point.
Existing extension points don't cover this:
* *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
* *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too early
* *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown
h2. Proposed change
Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
batch named "Early Operator Optimization" placed between *Replace Operators*
and *Aggregate,* before the fixed-point Operator Optimization batch.
{code:java}
Batch("Replace Operators", fixedPoint,
RewriteExceptAll,
RewriteIntersectAll,
ReplaceIntersectWithSemiJoin,
ReplaceExceptWithFilter,
ReplaceExceptWithAntiJoin,
ReplaceDistinctWithAggregate,
ReplaceDeduplicateWithAggregate) ::
Batch("Early Operator Optimization", Once,
earlyOperatorOptimizationRules: _*) ::
Batch("Aggregate", fixedPoint,
RemoveLiteralFromGroupExpressions,
RemoveRepetitionFromGroupExpressions) :: Nil ++ {code}
Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
{*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is registered.
h2. Compatibility
Purely additive: no existing API or batch ordering changes; default is
{{{}Nil{}}}.
was:
h2. Problem
Custom optimizer rules injected via *SparkSessionExtensions.injectOptimizerRule*
run inside the fixed-point {*}Operator Optimization batch{*}, alongside built-in
rewriters like {_}FoldablePropagation, ConstantFolding, and
PushDownPredicates{_}{*}.{*}
A custom rule that needs to observe the original plan shape (e.g. cross-side
join
predicates before they are folded into single-side constants) can be silently
defeated when a built-in rule transforms the plan first within the same fixed
point.
Existing extension points don't cover this:
* *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
* *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too early
* *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown
h2. Proposed change
Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
batch named "Early Operator Optimization" placed between *Replace Operators*
and *Aggregate,* before the fixed-point Operator Optimization batch.
Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
{*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is registered.
h2. Compatibility
Purely additive: no existing API or batch ordering changes; default is \{{Nil}}.
> Add earlyOperatorOptimizationRules extension point to Optimizer
> ---------------------------------------------------------------
>
> Key: SPARK-57194
> URL: https://issues.apache.org/jira/browse/SPARK-57194
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 3.4.1
> Reporter: Emily Sun
> Priority: Major
>
> h2. Problem
> Custom optimizer rules injected via
> *SparkSessionExtensions.injectOptimizerRule*
> run inside the fixed-point {*}Operator Optimization batch{*}, alongside
> built-in
> rewriters like {_}FoldablePropagation, ConstantFolding, and
> PushDownPredicates{_}{*}.{*}
> A custom rule that needs to observe the original plan shape (e.g. cross-side
> join
> predicates before they are folded into single-side constants) can be silently
> defeated when a built-in rule transforms the plan first within the same fixed
> point.
> Existing extension points don't cover this:
> * *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
> * *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too
> early
> * *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown
> h2. Proposed change
> Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
> batch named "Early Operator Optimization" placed between *Replace Operators*
> and *Aggregate,* before the fixed-point Operator Optimization batch.
> {code:java}
> Batch("Replace Operators", fixedPoint,
> RewriteExceptAll,
> RewriteIntersectAll,
> ReplaceIntersectWithSemiJoin,
> ReplaceExceptWithFilter,
> ReplaceExceptWithAntiJoin,
> ReplaceDistinctWithAggregate,
> ReplaceDeduplicateWithAggregate) ::
> Batch("Early Operator Optimization", Once,
> earlyOperatorOptimizationRules: _*) ::
> Batch("Aggregate", fixedPoint,
> RemoveLiteralFromGroupExpressions,
> RemoveRepetitionFromGroupExpressions) :: Nil ++ {code}
> Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
> {*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is
> registered.
> h2. Compatibility
> Purely additive: no existing API or batch ordering changes; default is
> {{{}Nil{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]