[jira] [Updated] (SPARK-52339) Relations may appear equal even though they are different

Cheng Pan (Jira) Tue, 24 Jun 2025 14:52:26 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-52339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cheng Pan updated SPARK-52339:
------------------------------
    Fix Version/s: 3.5.7

> Relations may appear equal even though they are different
> ---------------------------------------------------------
>
>                 Key: SPARK-52339
>                 URL: https://issues.apache.org/jira/browse/SPARK-52339
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.6, 4.0.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>              Labels: correctness, pull-request-available
>             Fix For: 4.1.0, 4.0.1, 3.5.7
>
>
> For example:
> {noformat}
> // create test data
> val data = Seq((1, 2), (2, 3)).toDF("a", "b")
> data.write.mode("overwrite").csv("/tmp/test")
> val fileList1 = List.fill(2)("/tmp/test")
> val fileList2 = List.fill(3)("/tmp/test")
> val df1 = spark.read.schema("a int, b int").csv(fileList1: _*)
> val df2 = spark.read.schema("a int, b int").csv(fileList2: _*)
> df1.count() // correctly returns 4
> df2.count() // correctly returns 6
> // the following is the same as above, except df1 is persisted
> val df1 = spark.read.schema("a int, b int").csv(fileList1: _*).persist
> val df2 = spark.read.schema("a int, b int").csv(fileList2: _*)
> df1.count() // correctly returns 4
> df2.count() // incorrectly returns 4!!
> {noformat}
> df1 and df2 were created with a different number of paths: df1 has 2, and df2 
> has 3. But since the distinct set of root paths is the same (e.g., 
> {{Set("/tmp/test") == Set("/tmp/test"))}}, the two dataframes are considered 
> equal. Thus, when df1 is persisted, df2 uses df1's cached plan.
> The same bug also causes inappropriate exchange reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-52339) Relations may appear equal even though they are different

Reply via email to