[ 
https://issues.apache.org/jira/browse/SPARK-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474239#comment-15474239
 ] 

Nattavut Sutyanyong commented on SPARK-14040:
---------------------------------------------

The root cause of this problem is the way Spark implemented to generate a 
unique identifier for each column without an obvious way to distinguish 
multiple references to the same column. This problem has been discovered in 
different contexts and different approaches to fix this problem have been 
discussed in various places:

SPARK-14040
SPARK-17337

A partial fix was implemented in the {{dedupRight()}} method for the {{Join}} 
operator.

and the latest attempt to fix this in SPARK-17154. We should solve this problem 
at the root cause. I will post my idea in SPARK-17154. We shall close this JIRA 
as a duplicate.



> Null-safe and equality join produces incorrect result with filtered dataframe
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-14040
>                 URL: https://issues.apache.org/jira/browse/SPARK-14040
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: Ubuntu Linux 15.10
>            Reporter: Denton Cockburn
>
> Initial issue reported here: 
> http://stackoverflow.com/questions/36131942/spark-join-produces-wrong-results
>       val b = Seq(("a", "b", 1), ("a", "b", 2)).toDF("a", "b", "c")
>       val a = b.where("c = 1").withColumnRenamed("a", 
> "filta").withColumnRenamed("b", "filtb")
>       a.join(b, $"filta" <=> $"a" and $"filtb" <=> $"b" and a("c") <=> 
> b("c"), "left_outer").show
> Produces 2 rows instead of the expected 1.
>       a.withColumn("newc", $"c").join(b, $"filta" === $"a" and $"filtb" === 
> $"b" and $"newc" === b("c"), "left_outer").show
> Also produces 2 rows instead of the expected 1.
> The only one that seemed to work correctly was:
>       a.join(b, $"filta" === $"a" and $"filtb" === $"b" and a("c") === 
> b("c"), "left_outer").show
> But that produced a warning for :  
>       WARN Column: Constructing trivially true equals predicate, 'c#18232 = 
> c#18232' 
> As pointed out by commenter zero323:
> "The second behavior looks indeed like a bug related to the fact that you 
> still have a.c in your data. It looks like it is picked downstream before b.c 
> and the evaluated condition is actually a.newc = a.c"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to