[
https://issues.apache.org/jira/browse/SPARK-55438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057309#comment-18057309
]
loong commented on SPARK-55438:
-------------------------------
The plan
{code:java}
Project [id#7 AS id_#47]
+- Project [id#7]
+- SubqueryAlias spark_catalog.default.tmp
+- Relation default.tmp[id#7] parquet
{code}
The top two projects will be merged by the CombineUnions rule, resulting in
changes to the plan that cannot match the cached plan.
> Rule CombineUnions makes cache invalid
> --------------------------------------
>
> Key: SPARK-55438
> URL: https://issues.apache.org/jira/browse/SPARK-55438
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.8
> Reporter: loong
> Priority: Major
>
> The following code can reproduces this issue, job df2.show() can not use the
> cache of df.
> {code:java}
> spark.sql("select 1 as id").write.saveAsTable("tmp")
> val df: DataFrame = spark.table("tmp")
> .select($"id")
> df.persist()
> df.count()
> val df2 = df.select($"id".as("id_"))
> .unionByName(df.select($"id".as("id_")))
> df2.show()
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]