[
https://issues.apache.org/jira/browse/SPARK-23677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Mauch resolved SPARK-23677.
----------------------------------
Resolution: Duplicate
> Selecting columns from joined DataFrames with the same origin yields wrong
> results
> ----------------------------------------------------------------------------------
>
> Key: SPARK-23677
> URL: https://issues.apache.org/jira/browse/SPARK-23677
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.2.1, 2.3.0
> Reporter: Martin Mauch
> Priority: Major
>
> When trying to join two DataFrames with the same origin DataFrame and later
> selecting columns from the join, Spark can't distinguish between the columns
> and gives a wrong (or at least very surprising) result. One can work around
> this using expr.
> Here is a minimal example:
>
> {code:java}
> import spark.implicits._
> val edf = Seq((1), (2), (3), (4), (5)).toDF("num")
> val big = edf.where(edf("num") > 2).alias("big")
> val small = edf.where(edf("num") < 4).alias("small")
> small.join(big, expr("big.num == (small.num + 1)")).select(small("num"),
> big("num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 2|
> // | 3| 3|
> // +—+—+
> small.join(big, expr("big.num == (small.num + 1)")).select(expr("small.num"),
> expr("big.num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 3|
> // | 3| 4|
> // +---+---+
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]