Nimesh Khandelwal created SPARK-51068:
-----------------------------------------

             Summary: CTEs are not canonicalized and resulting in cached result 
not being used and recomputed
                 Key: SPARK-51068
                 URL: https://issues.apache.org/jira/browse/SPARK-51068
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.2, 3.1.3, 3.1.2, 4.0.0
            Reporter: Nimesh Khandelwal


To check whether the plan exists in the cache or not, CacheManager matches the 
canonicalized version of the plan. Currently, in canonicalized versions, CTEIds 
are not handled and thus result in unnecessary cache misses in cases where 
queries using CTE are stored. This issue starts after the commit to [Avoid 
inlining non-deterministic 
With-CTEs|https://github.com/apache/spark/pull/33671/files] in which each 
CTERelationDef and CTERelationRef were introduced and their canonicalization 
was not handled.
{code:java}
>>>spark.sql("CACHE TABLE cached_cte AS WITH cte1 AS ( SELECT 1 AS id, 'Alice' 
>>>AS name UNION ALL SELECT 2 AS id, 'Bob' AS name ), cte2 AS ( SELECT 1 AS id, 
>>>10 AS score UNION ALL SELECT 2 AS id, 20 AS score ) SELECT cte1.id, 
>>>cte1.name, cte2.score FROM cte1 JOIN cte2 ON cte1.id = cte2.id");
DataFrame[]
>>> spark.sql("select count(*) from cached_cte").explain()
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(1)])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=165]
      +- HashAggregate(keys=[], functions=[partial_count(1)])
         +- Project
            +- BroadcastHashJoin [id#120], [id#124], Inner, BuildRight, false
               :- Union
               :  :- Project [1 AS id#120]
               :  :  +- Scan OneRowRelation[]
               :  +- Project [2 AS id#122]
               :     +- Scan OneRowRelation[]
               +- BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), 
[plan_id=160]
                  +- Union
                     :- Project [1 AS id#124]
                     :  +- Scan OneRowRelation[]
                     +- Project [2 AS id#126]
                        +- Scan OneRowRelation[]{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to