[ https://issues.apache.org/jira/browse/SPARK-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372447#comment-14372447 ]
Shivaram Venkataraman commented on SPARK-6231: ---------------------------------------------- [~marmbrus] I've sent the dataset to you by email. The code that used to cause this bug is at https://gist.github.com/shivaram/4ff0a9c226dda2030507 > Join on two tables (generated from same one) is broken > ------------------------------------------------------ > > Key: SPARK-6231 > URL: https://issues.apache.org/jira/browse/SPARK-6231 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0, 1.4.0 > Reporter: Davies Liu > Assignee: Michael Armbrust > Priority: Blocker > Labels: DataFrame > > If the two column used in joinExpr come from the same table, they have the > same id, then the joniExpr is explained in wrong way. > {code} > val df = sqlContext.load(path, "parquet") > val txns = df.groupBy("cust_id").agg($"cust_id", > countDistinct($"day_num").as("txns")) > val spend = df.groupBy("cust_id").agg($"cust_id", > sum($"extended_price").as("spend")) > val rmJoin = txns.join(spend, txns("cust_id") === spend("cust_id"), "inner") > scala> rmJoin.explain > == Physical Plan == > CartesianProduct > Filter (cust_id#0 = cust_id#0) > Aggregate false, [cust_id#0], [cust_id#0,CombineAndCount(partialSets#25) AS > txns#7L] > Exchange (HashPartitioning [cust_id#0], 200) > Aggregate true, [cust_id#0], [cust_id#0,AddToHashSet(day_num#2L) AS > partialSets#25] > PhysicalRDD [cust_id#0,day_num#2L], MapPartitionsRDD[1] at map at > newParquet.scala:542 > Aggregate false, [cust_id#17], [cust_id#17,SUM(PartialSum#38) AS spend#8] > Exchange (HashPartitioning [cust_id#17], 200) > Aggregate true, [cust_id#17], [cust_id#17,SUM(extended_price#20) AS > PartialSum#38] > PhysicalRDD [cust_id#17,extended_price#20], MapPartitionsRDD[3] at map at > newParquet.scala:542 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org