[
https://issues.apache.org/jira/browse/SPARK-34807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482409#comment-17482409
]
Apache Spark commented on SPARK-34807:
--------------------------------------
User 'constzhou' has created a pull request for this issue:
https://github.com/apache/spark/pull/35334
> Push down filter through window after TransposeWindow
> -----------------------------------------------------
>
> Key: SPARK-34807
> URL: https://issues.apache.org/jira/browse/SPARK-34807
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Yuming Wang
> Assignee: Tanel Kiis
> Priority: Major
> Fix For: 3.2.0
>
>
> {code:scala}
> spark.range(10).selectExpr("id AS a", "id AS b", "id AS c", "id AS
> d").createTempView("t1")
> val df = spark.sql(
> """
> |SELECT *
> | FROM (
> | SELECT b,
> | sum(d) OVER (PARTITION BY a, b),
> | rank() OVER (PARTITION BY a ORDER BY c)
> | FROM t1
> | ) v1
> |WHERE b = 2
> |""".stripMargin)
> {code}
> Current optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED
> PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY
> c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Filter (b#221L = 2)
> +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC NULLS
> FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$()))
> AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST ROWS BETWEEN
> UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC NULLS FIRST]
> +- Project [b#221L, a#220L, c#222L, sum(d) OVER (PARTITION BY a, b ROWS
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#231L]
> +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L,
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()))
> AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
> +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS
> a#220L, id#218L AS c#222L]
> +- Range (0, 10, step=1, splits=Some(2))
> {noformat}
> Expected optimized plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [b#221L, sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED
> PRECEDING AND UNBOUNDED FOLLOWING)#231L, RANK() OVER (PARTITION BY a ORDER BY
> c ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Window [sum(d#223L) windowspecdefinition(a#220L, b#221L,
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()))
> AS sum(d) OVER (PARTITION BY a, b ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING)#231L], [a#220L, b#221L]
> +- Project [b#221L, d#223L, a#220L, RANK() OVER (PARTITION BY a ORDER BY c
> ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232]
> +- Filter (b#221L = 2)
> +- Window [rank(c#222L) windowspecdefinition(a#220L, c#222L ASC
> NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(),
> currentrow$())) AS RANK() OVER (PARTITION BY a ORDER BY c ASC NULLS FIRST
> ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#232], [a#220L], [c#222L ASC
> NULLS FIRST]
> +- Project [id#218L AS b#221L, id#218L AS d#223L, id#218L AS
> a#220L, id#218L AS c#222L]
> +- Range (0, 10, step=1, splits=Some(2))
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]