[jira] [Updated] (SPARK-43946) Add rule to remove unused CTEDef

jinhai-cloud (Jira) Fri, 02 Jun 2023 00:44:11 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-43946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jinhai-cloud updated SPARK-43946:
---------------------------------
    Description: 
{code:java}
// code placeholder
with t1 as (
  select rand() c3
),
t2 as (select * from t1)
select c3 from t1 where c3 > 0 {code}
{code:java}
// code placeholder
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE ===
 WithCTE                                               WithCTE
 :- CTERelationDef 0, false                            :- CTERelationDef 0, 
false
 :  +- Project [rand(3418873542988342437) AS c3#236]   :  +- Project 
[rand(3418873542988342437) AS c3#236]
 :     +- OneRowRelation                               :     +- OneRowRelation
!:- CTERelationDef 1, false                            +- Project [c3#236]
!:  +- Project [c3#236]                                   +- Filter (c3#236 > 
cast(0 as double))
!:     +- CTERelationRef 0, true, [c3#236]                   +- CTERelationRef 
0, true, [c3#236]
!+- Project [c3#236]                                   
!   +- Filter (c3#236 > cast(0 as double))             
!      +- CTERelationRef 0, true, [c3#236]             
 {code}
When the above query applies the inlineCTE rule, inline is not possible because 
the refCount of CTERelationDef 0 is equal to 2.

However, according to the optimized logicalplan, the plan can be further 
optimized because the refCount of CTERelationDef 0 is equal to 1.

Therefore, we can add the rule *RemoveRedundantCTEDef* to delete the 
unreferenced CTERelationDef to prevent the refCount from being miscalculated
{code:java}
// code placeholder
Project [c3#236]
+- Filter (c3#236 > cast(0 as double))
   +- Project [rand(-7871530451581327544) AS c3#236]
      +- OneRowRelation {code}

> Add rule to remove unused CTEDef
> --------------------------------
>
>                 Key: SPARK-43946
>                 URL: https://issues.apache.org/jira/browse/SPARK-43946
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.4, 3.3.2, 3.4.0
>            Reporter: jinhai-cloud
>            Priority: Major
>
> {code:java}
> // code placeholder
> with t1 as (
>   select rand() c3
> ),
> t2 as (select * from t1)
> select c3 from t1 where c3 > 0 {code}
> {code:java}
> // code placeholder
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE ===
>  WithCTE                                               WithCTE
>  :- CTERelationDef 0, false                            :- CTERelationDef 0, 
> false
>  :  +- Project [rand(3418873542988342437) AS c3#236]   :  +- Project 
> [rand(3418873542988342437) AS c3#236]
>  :     +- OneRowRelation                               :     +- OneRowRelation
> !:- CTERelationDef 1, false                            +- Project [c3#236]
> !:  +- Project [c3#236]                                   +- Filter (c3#236 > 
> cast(0 as double))
> !:     +- CTERelationRef 0, true, [c3#236]                   +- 
> CTERelationRef 0, true, [c3#236]
> !+- Project [c3#236]                                   
> !   +- Filter (c3#236 > cast(0 as double))             
> !      +- CTERelationRef 0, true, [c3#236]             
>  {code}
> When the above query applies the inlineCTE rule, inline is not possible 
> because the refCount of CTERelationDef 0 is equal to 2.
> However, according to the optimized logicalplan, the plan can be further 
> optimized because the refCount of CTERelationDef 0 is equal to 1.
> Therefore, we can add the rule *RemoveRedundantCTEDef* to delete the 
> unreferenced CTERelationDef to prevent the refCount from being miscalculated
> {code:java}
> // code placeholder
> Project [c3#236]
> +- Filter (c3#236 > cast(0 as double))
>    +- Project [rand(-7871530451581327544) AS c3#236]
>       +- OneRowRelation {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43946) Add rule to remove unused CTEDef

Reply via email to