[jira] [Commented] (SPARK-11705) Eliminate unnecessary Cartesian Join

Zhan Zhang (JIRA) Fri, 13 Nov 2015 13:30:30 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004744#comment-15004744
 ]


Zhan Zhang commented on SPARK-11705:
------------------------------------

simple reproduce step:
import sqlContext.implicits._
case class SimpleRecord(key: Int, value: String)
def withDF(name: String) = {
  val df =  sc.parallelize((0 until 10).map(x => SimpleRecord(x, 
s"record_$x"))).toDF()
  df.registerTempTable(name)
}
withDF("p")
withDF("s")
withDF("l")

val d = sqlContext.sql(s"select p.key, p.value, s.value, l.value from p, s, l 
where l.key = s.key and p.key = l.key")
d.queryExecution.sparkPlan

res15: org.apache.spark.sql.execution.SparkPlan =
TungstenProject [key#0,value#1,value#3,value#5]
 SortMergeJoin [key#2,key#0], [key#4,key#4]
  CartesianProduct
   Scan PhysicalRDD[key#0,value#1]
   Scan PhysicalRDD[key#2,value#3]
  Scan PhysicalRDD[key#4,value#5]


val d1 = sqlContext.sql(s"select p.key, p.value, s.value, l.value from s, l, p 
where l.key = s.key and p.key = l.key")
d1.queryExecution.sparkPlan

res16: org.apache.spark.sql.execution.SparkPlan =
TungstenProject [key#0,value#1,value#3,value#5]
 SortMergeJoin [key#4], [key#0]
  TungstenProject [key#4,value#5,value#3]
   SortMergeJoin [key#2], [key#4]
    Scan PhysicalRDD[key#2,value#3]
    Scan PhysicalRDD[key#4,value#5]
  Scan PhysicalRDD[key#0,value#1]

> Eliminate unnecessary Cartesian Join
> ------------------------------------
>
>                 Key: SPARK-11705
>                 URL: https://issues.apache.org/jira/browse/SPARK-11705
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Zhan Zhang
>
> When we have some queries similar to following (don’t remember the exact 
> form):
> select * from a, b, c, d where a.key1 = c.key1 and b.key2 = c.key2 and c.key3 
> = d.key3
> There will be a cartesian join between a and b. But if we just simply change 
> the table order, for example from a, c, b, d, such cartesian join are 
> eliminated.
> Without such manual tuning, the query will never finish if a, b are big. But 
> we should not relies on such manual optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11705) Eliminate unnecessary Cartesian Join

Reply via email to