Ali Afroozeh created SPARK-30072:
------------------------------------
Summary: Create dedicated planner for subqueries
Key: SPARK-30072
URL: https://issues.apache.org/jira/browse/SPARK-30072
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Environment: This PR changes subquery planning by calling the planner
and plan preparation rules on the subquery plan directly. Before we were
creating a QueryExecution instance for subqueries to get the executedPlan. This
would re-run analysis and optimization on the subqueries plan. Running the
analysis again on an optimized query plan can have unwanted consequences, as
some rules, for example DecimalPrecision, are not idempotent.
As an example, consider the expression 1.7 * avg(x) which after applying the
DecimalPrecision rule becomes:
promote_precision(1.7) * promote_precision(avg(x))
After the optimization, more specifically the constant folding rule, this
expression becomes:
1.7 * promote_precision(avg(x))
Now if we run the analyzer on this optimized query again, we will get:
promote_precision(1.7) * promote_precision(promote_precision(avg(x)))
Which will later optimized as:
1.7 * promote_precision(promote_precision(avg(x)))
As can be seen, re-running the analysis and optimization on this expression
results in an expression with extra nested promote_preceision nodes. Adding
unneeded nodes to the plan is problematic because it can eliminate situations
where we can reuse the plan.
We opted to introduce dedicated planners for subuqueries, instead of making the
DecimalPrecision rule idempotent, because this eliminates this entire category
of problems. Another benefit is that planning time for subqueries is reduced.
Reporter: Ali Afroozeh
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]