[
https://issues.apache.org/jira/browse/SPARK-55438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
loong updated SPARK-55438:
--------------------------
Description:
The following code can reproduces this issue, job df2.show() can not use the
cache of df.
{code:java}
spark.sql("select 1 as id").write.saveAsTable("tmp")
val df: DataFrame = spark.table("tmp")
.select($"id")
df.persist()
df.count()
val df2 = df.select($"id".as("id_"))
.unionByName(df.select($"id".as("id_")))
df2.show()
{code}
was:
{code:java}
spark.sql("select 1 as id").write.saveAsTable("tmp")
val df: DataFrame = spark.table("tmp")
.select($"id")
df.persist()
df.count()
df.select($"id".as("id_"))
.unionByName(df.select($"id".as("id_")))
.show()
{code}
> Rule CombineUnions makes cache invalid
> --------------------------------------
>
> Key: SPARK-55438
> URL: https://issues.apache.org/jira/browse/SPARK-55438
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.8
> Reporter: loong
> Priority: Major
>
> The following code can reproduces this issue, job df2.show() can not use the
> cache of df.
> {code:java}
> spark.sql("select 1 as id").write.saveAsTable("tmp")
> val df: DataFrame = spark.table("tmp")
> .select($"id")
> df.persist()
> df.count()
> val df2 = df.select($"id".as("id_"))
> .unionByName(df.select($"id".as("id_")))
> df2.show()
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]