[jira] [Created] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer

Hyukjin Kwon (JIRA) Thu, 08 Aug 2019 00:33:34 -0700

Hyukjin Kwon created SPARK-28654:
------------------------------------

             Summary: Move "Extract Python UDFs" to the last in optimizer
                 Key: SPARK-28654
                 URL: https://issues.apache.org/jira/browse/SPARK-28654
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Hyukjin Kwon



Plans after "Extract Python UDFs" are very flaky and error-prone to other 
plans. For instance,
if we add some rules, for instance, [{PushDownPredicates}}, 

The optimization is rolled back as below:

{code}
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates ===
!Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))   Join Cross, 
(dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18))
!+- Join Cross                                         :- Project [_1#2 AS a#7, 
_2#3 AS b#8]
!   :- Project [_1#2 AS a#7, _2#3 AS b#8]              :  +- LocalRelation 
[_1#2, _2#3]
!   :  +- LocalRelation [_1#2, _2#3]                   +- Project [_1#13 AS 
c#18, _2#14 AS d#19]
!   +- Project [_1#13 AS c#18, _2#14 AS d#19]             +- LocalRelation 
[_1#13, _2#14]
!      +- LocalRelation [_1#13, _2#14]                 
{code}

Seems we should do Python UDFs cases at the last even after post hoc rules.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer

Reply via email to