Nimesh Khandelwal created SPARK-51068:
-----------------------------------------
Summary: CTEs are not canonicalized and resulting in cached result
not being used and recomputed
Key: SPARK-51068
URL: https://issues.apache.org/jira/browse/SPARK-51068
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.2, 3.1.3, 3.1.2, 4.0.0
Reporter: Nimesh Khandelwal
To check whether the plan exists in the cache or not, CacheManager matches the
canonicalized version of the plan. Currently, in canonicalized versions, CTEIds
are not handled and thus result in unnecessary cache misses in cases where
queries using CTE are stored. This issue starts after the commit to [Avoid
inlining non-deterministic
With-CTEs|https://github.com/apache/spark/pull/33671/files] in which each
CTERelationDef and CTERelationRef were introduced and their canonicalization
was not handled.
{code:java}
>>>spark.sql("CACHE TABLE cached_cte AS WITH cte1 AS ( SELECT 1 AS id, 'Alice'
>>>AS name UNION ALL SELECT 2 AS id, 'Bob' AS name ), cte2 AS ( SELECT 1 AS id,
>>>10 AS score UNION ALL SELECT 2 AS id, 20 AS score ) SELECT cte1.id,
>>>cte1.name, cte2.score FROM cte1 JOIN cte2 ON cte1.id = cte2.id");
DataFrame[]
>>> spark.sql("select count(*) from cached_cte").explain()
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(1)])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=165]
+- HashAggregate(keys=[], functions=[partial_count(1)])
+- Project
+- BroadcastHashJoin [id#120], [id#124], Inner, BuildRight, false
:- Union
: :- Project [1 AS id#120]
: : +- Scan OneRowRelation[]
: +- Project [2 AS id#122]
: +- Scan OneRowRelation[]
+- BroadcastExchange
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false),
[plan_id=160]
+- Union
:- Project [1 AS id#124]
: +- Scan OneRowRelation[]
+- Project [2 AS id#126]
+- Scan OneRowRelation[]{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]