[ 
https://issues.apache.org/jira/browse/SPARK-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28654.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 25386
[https://github.com/apache/spark/pull/25386]

> Move "Extract Python UDFs" to the last in optimizer
> ---------------------------------------------------
>
>                 Key: SPARK-28654
>                 URL: https://issues.apache.org/jira/browse/SPARK-28654
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Plans after "Extract Python UDFs" are very flaky and error-prone to other 
> plans. For instance,
> if we add some rules, for instance, [{PushDownPredicates}}, 
> The optimization is rolled back as below:
> {code}
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates 
> ===
> !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))   Join Cross, 
> (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))
> !+- Join Cross                                         :- Project [_1#2 AS 
> a#7, _2#3 AS b#8]
> !   :- Project [_1#2 AS a#7, _2#3 AS b#8]              :  +- LocalRelation 
> [_1#2, _2#3]
> !   :  +- LocalRelation [_1#2, _2#3]                   +- Project [_1#13 AS 
> c#18, _2#14 AS d#19]
> !   +- Project [_1#13 AS c#18, _2#14 AS d#19]             +- LocalRelation 
> [_1#13, _2#14]
> !      +- LocalRelation [_1#13, _2#14]                 
> {code}
> Seems we should do Python UDFs cases at the last even after post hoc rules.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to