[ https://issues.apache.org/jira/browse/SPARK-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004744#comment-15004744 ]
Zhan Zhang commented on SPARK-11705: ------------------------------------ simple reproduce step: import sqlContext.implicits._ case class SimpleRecord(key: Int, value: String) def withDF(name: String) = { val df = sc.parallelize((0 until 10).map(x => SimpleRecord(x, s"record_$x"))).toDF() df.registerTempTable(name) } withDF("p") withDF("s") withDF("l") val d = sqlContext.sql(s"select p.key, p.value, s.value, l.value from p, s, l where l.key = s.key and p.key = l.key") d.queryExecution.sparkPlan res15: org.apache.spark.sql.execution.SparkPlan = TungstenProject [key#0,value#1,value#3,value#5] SortMergeJoin [key#2,key#0], [key#4,key#4] CartesianProduct Scan PhysicalRDD[key#0,value#1] Scan PhysicalRDD[key#2,value#3] Scan PhysicalRDD[key#4,value#5] val d1 = sqlContext.sql(s"select p.key, p.value, s.value, l.value from s, l, p where l.key = s.key and p.key = l.key") d1.queryExecution.sparkPlan res16: org.apache.spark.sql.execution.SparkPlan = TungstenProject [key#0,value#1,value#3,value#5] SortMergeJoin [key#4], [key#0] TungstenProject [key#4,value#5,value#3] SortMergeJoin [key#2], [key#4] Scan PhysicalRDD[key#2,value#3] Scan PhysicalRDD[key#4,value#5] Scan PhysicalRDD[key#0,value#1] > Eliminate unnecessary Cartesian Join > ------------------------------------ > > Key: SPARK-11705 > URL: https://issues.apache.org/jira/browse/SPARK-11705 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Zhan Zhang > > When we have some queries similar to following (don’t remember the exact > form): > select * from a, b, c, d where a.key1 = c.key1 and b.key2 = c.key2 and c.key3 > = d.key3 > There will be a cartesian join between a and b. But if we just simply change > the table order, for example from a, c, b, d, such cartesian join are > eliminated. > Without such manual tuning, the query will never finish if a, b are big. But > we should not relies on such manual optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org