[ https://issues.apache.org/jira/browse/SPARK-23677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Mauch resolved SPARK-23677. ---------------------------------- Resolution: Duplicate > Selecting columns from joined DataFrames with the same origin yields wrong > results > ---------------------------------------------------------------------------------- > > Key: SPARK-23677 > URL: https://issues.apache.org/jira/browse/SPARK-23677 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.2.1, 2.3.0 > Reporter: Martin Mauch > Priority: Major > > When trying to join two DataFrames with the same origin DataFrame and later > selecting columns from the join, Spark can't distinguish between the columns > and gives a wrong (or at least very surprising) result. One can work around > this using expr. > Here is a minimal example: > > {code:java} > import spark.implicits._ > val edf = Seq((1), (2), (3), (4), (5)).toDF("num") > val big = edf.where(edf("num") > 2).alias("big") > val small = edf.where(edf("num") < 4).alias("small") > small.join(big, expr("big.num == (small.num + 1)")).select(small("num"), > big("num")).show() > // +---+---+ > // |num|num| > // +---+---+ > // | 2| 2| > // | 3| 3| > // +—+—+ > small.join(big, expr("big.num == (small.num + 1)")).select(expr("small.num"), > expr("big.num")).show() > // +---+---+ > // |num|num| > // +---+---+ > // | 2| 3| > // | 3| 4| > // +---+---+ > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org