[ 
https://issues.apache.org/jira/browse/SPARK-55438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057313#comment-18057313
 ] 

loong commented on SPARK-55438:
-------------------------------

The unionByName method is as follows.
{code:java}
def unionByName(other: Dataset[T], allowMissingColumns: Boolean): Dataset[T] = 
withSetOperator {
  // This breaks caching, but it's usually ok because it addresses a very 
specific use case:
  // using union to union many files or partitions.
  CombineUnions(Union(logicalPlan :: other.logicalPlan :: Nil, true, 
allowMissingColumns))
} {code}
My question is, can we use logical plan `withCachedData` to replace the logical 
plan here.
{code:java}
lazy val withCachedData: LogicalPlan = sparkSession.withActive {
  assertAnalyzed()
  assertSupported()
  // clone the plan to avoid sharing the plan instance between different stages 
like analyzing,
  // optimizing and planning.
  sparkSession.sharedState.cacheManager.useCachedData(commandExecuted.clone())
} {code}
 

> Rule CombineUnions makes cache invalid
> --------------------------------------
>
>                 Key: SPARK-55438
>                 URL: https://issues.apache.org/jira/browse/SPARK-55438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.8
>            Reporter: loong
>            Priority: Major
>
> The following code can reproduces this issue, job df2.show() can not use the 
> cache of df.
> {code:java}
>     spark.sql("select 1 as id").write.saveAsTable("tmp")
>     val df: DataFrame = spark.table("tmp")
>       .select($"id")
>     df.persist()
>     df.count()
>     val df2 = df.select($"id".as("id_"))
>       .unionByName(df.select($"id".as("id_")))
>     df2.show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to