[jira] [Resolved] (SPARK-23677) Selecting columns from joined DataFrames with the same origin yields wrong results

Martin Mauch (JIRA) Fri, 16 Mar 2018 10:25:23 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-23677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martin Mauch resolved SPARK-23677.
----------------------------------
    Resolution: Duplicate

> Selecting columns from joined DataFrames with the same origin yields wrong 
> results
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-23677
>                 URL: https://issues.apache.org/jira/browse/SPARK-23677
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.2.1, 2.3.0
>            Reporter: Martin Mauch
>            Priority: Major
>
> When trying to join two DataFrames with the same origin DataFrame and later 
> selecting columns from the join, Spark can't distinguish between the columns 
> and gives a wrong (or at least very surprising) result. One can work around 
> this using expr.
> Here is a minimal example:
>  
> {code:java}
> import spark.implicits._
> val edf = Seq((1), (2), (3), (4), (5)).toDF("num")
> val big = edf.where(edf("num") > 2).alias("big")
> val small = edf.where(edf("num") < 4).alias("small")
> small.join(big, expr("big.num == (small.num + 1)")).select(small("num"), 
> big("num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 2|
> // | 3| 3|
> // +—+—+
> small.join(big, expr("big.num == (small.num + 1)")).select(expr("small.num"), 
> expr("big.num")).show()
> // +---+---+
> // |num|num|
> // +---+---+
> // | 2| 3|
> // | 3| 4|
> // +---+---+
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23677) Selecting columns from joined DataFrames with the same origin yields wrong results

Reply via email to