[ 
https://issues.apache.org/jira/browse/SPARK-55438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057309#comment-18057309
 ] 

loong commented on SPARK-55438:
-------------------------------

The plan


{code:java}
Project [id#7 AS id_#47]
+- Project [id#7]
   +- SubqueryAlias spark_catalog.default.tmp
      +- Relation default.tmp[id#7] parquet
{code}

The top two projects will be merged by the CombineUnions rule, resulting in 
changes to the plan that cannot match the cached plan.


> Rule CombineUnions makes cache invalid
> --------------------------------------
>
>                 Key: SPARK-55438
>                 URL: https://issues.apache.org/jira/browse/SPARK-55438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.8
>            Reporter: loong
>            Priority: Major
>
> The following code can reproduces this issue, job df2.show() can not use the 
> cache of df.
> {code:java}
>     spark.sql("select 1 as id").write.saveAsTable("tmp")
>     val df: DataFrame = spark.table("tmp")
>       .select($"id")
>     df.persist()
>     df.count()
>     val df2 = df.select($"id".as("id_"))
>       .unionByName(df.select($"id".as("id_")))
>     df2.show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to