[ https://issues.apache.org/jira/browse/SPARK-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14374742#comment-14374742 ]
Michael Armbrust commented on SPARK-6231: ----------------------------------------- [~chinnitv] I don't think that that is the same bug as this one, as here we are working with already resolved trees that can only be constructed by the DataFrame API. Can you open a second JIRA targeted at 1.3.1? Also can you clarify if it does work at the commit you specify? That commit seems likely unrelated. > Join on two tables (generated from same one) is broken > ------------------------------------------------------ > > Key: SPARK-6231 > URL: https://issues.apache.org/jira/browse/SPARK-6231 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0, 1.4.0 > Reporter: Davies Liu > Assignee: Michael Armbrust > Priority: Blocker > Labels: DataFrame > > If the two column used in joinExpr come from the same table, they have the > same id, then the joniExpr is explained in wrong way. > {code} > val df = sqlContext.load(path, "parquet") > val txns = df.groupBy("cust_id").agg($"cust_id", > countDistinct($"day_num").as("txns")) > val spend = df.groupBy("cust_id").agg($"cust_id", > sum($"extended_price").as("spend")) > val rmJoin = txns.join(spend, txns("cust_id") === spend("cust_id"), "inner") > scala> rmJoin.explain > == Physical Plan == > CartesianProduct > Filter (cust_id#0 = cust_id#0) > Aggregate false, [cust_id#0], [cust_id#0,CombineAndCount(partialSets#25) AS > txns#7L] > Exchange (HashPartitioning [cust_id#0], 200) > Aggregate true, [cust_id#0], [cust_id#0,AddToHashSet(day_num#2L) AS > partialSets#25] > PhysicalRDD [cust_id#0,day_num#2L], MapPartitionsRDD[1] at map at > newParquet.scala:542 > Aggregate false, [cust_id#17], [cust_id#17,SUM(PartialSum#38) AS spend#8] > Exchange (HashPartitioning [cust_id#17], 200) > Aggregate true, [cust_id#17], [cust_id#17,SUM(extended_price#20) AS > PartialSum#38] > PhysicalRDD [cust_id#17,extended_price#20], MapPartitionsRDD[3] at map at > newParquet.scala:542 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org