Takeshi Yamamuro created SPARK-23172: ----------------------------------------
Summary: Respect Project nodes in ReorderJoin Key: SPARK-23172 URL: https://issues.apache.org/jira/browse/SPARK-23172 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.1 Reporter: Takeshi Yamamuro The current `ReorderJoin` optimizer rule cannot flatten a pattern `Join -> Project -> Join` because `ExtractFiltersAndInnerJoins` doesn't handle `Project` nodes. So, the current master cannot reorder joins in a query below; {code} val df1 = spark.range(100).selectExpr("id % 10 AS k0", s"id % 10 AS k1", s"id % 10 AS k2", "id AS v1") val df2 = spark.range(10).selectExpr("id AS k0", "id AS v2") val df3 = spark.range(10).selectExpr("id AS k1", "id AS v3") val df4 = spark.range(10).selectExpr("id AS k2", "id AS v4") df1.join(df2, "k0").join(df3, "k1").join(df4, "k2").explain(true) == Analyzed Logical Plan == k2: bigint, k1: bigint, k0: bigint, v1: bigint, v2: bigint, v3: bigint, v4: bigint Project [k2#5L, k1#4L, k0#3L, v1#6L, v2#16L, v3#24L, v4#32L] +- Join Inner, (k2#5L = k2#31L) :- Project [k1#4L, k0#3L, k2#5L, v1#6L, v2#16L, v3#24L] : +- Join Inner, (k1#4L = k1#23L) : :- Project [k0#3L, k1#4L, k2#5L, v1#6L, v2#16L] : : +- Join Inner, (k0#3L = k0#15L) : : :- Project [(id#0L % cast(10 as bigint)) AS k0#3L, (id#0L % cast(10 as bigint)) AS k1#4L, (id#0L % cast(10 as bigint)) AS k2#5L, id#0 L AS v1#6L] : : : +- Range (0, 100, step=1, splits=Some(4)) : : +- Project [id#12L AS k0#15L, id#12L AS v2#16L] : : +- Range (0, 10, step=1, splits=Some(4)) : +- Project [id#20L AS k1#23L, id#20L AS v3#24L] : +- Range (0, 10, step=1, splits=Some(4)) +- Project [id#28L AS k2#31L, id#28L AS v4#32L] +- Range (0, 10, step=1, splits=Some(4)) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org