Hyukjin Kwon created SPARK-28654: ------------------------------------ Summary: Move "Extract Python UDFs" to the last in optimizer Key: SPARK-28654 URL: https://issues.apache.org/jira/browse/SPARK-28654 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hyukjin Kwon
Plans after "Extract Python UDFs" are very flaky and error-prone to other plans. For instance, if we add some rules, for instance, [{PushDownPredicates}}, The optimization is rolled back as below: {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates === !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) Join Cross, (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) !+- Join Cross :- Project [_1#2 AS a#7, _2#3 AS b#8] ! :- Project [_1#2 AS a#7, _2#3 AS b#8] : +- LocalRelation [_1#2, _2#3] ! : +- LocalRelation [_1#2, _2#3] +- Project [_1#13 AS c#18, _2#14 AS d#19] ! +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation [_1#13, _2#14] ! +- LocalRelation [_1#13, _2#14] {code} Seems we should do Python UDFs cases at the last even after post hoc rules. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org