[ 
https://issues.apache.org/jira/browse/SPARK-55438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

loong updated SPARK-55438:
--------------------------
    Description: 
The following code can reproduces this issue, job df2.show() can not use the 
cache of df.
{code:java}
    spark.sql("select 1 as id").write.saveAsTable("tmp")
    val df: DataFrame = spark.table("tmp")
      .select($"id")
    df.persist()
    df.count()
    val df2 = df.select($"id".as("id_"))
      .unionByName(df.select($"id".as("id_")))
    df2.show()
{code}


  was:

{code:java}
    spark.sql("select 1 as id").write.saveAsTable("tmp")
    val df: DataFrame = spark.table("tmp")
      .select($"id")
    df.persist()
    df.count()
    df.select($"id".as("id_"))
      .unionByName(df.select($"id".as("id_")))
      .show()
{code}



> Rule CombineUnions makes cache invalid
> --------------------------------------
>
>                 Key: SPARK-55438
>                 URL: https://issues.apache.org/jira/browse/SPARK-55438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.8
>            Reporter: loong
>            Priority: Major
>
> The following code can reproduces this issue, job df2.show() can not use the 
> cache of df.
> {code:java}
>     spark.sql("select 1 as id").write.saveAsTable("tmp")
>     val df: DataFrame = spark.table("tmp")
>       .select($"id")
>     df.persist()
>     df.count()
>     val df2 = df.select($"id".as("id_"))
>       .unionByName(df.select($"id".as("id_")))
>     df2.show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to