Peter Toth created SPARK-33303:
----------------------------------
Summary: Deduplicate deterministic UDF calls
Key: SPARK-33303
URL: https://issues.apache.org/jira/browse/SPARK-33303
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.1.0
Reporter: Peter Toth
We run into an issue where a customer created a column with an expensive UDF
call and build a very complex logic on the the top of that column as new
derived columns. Due to `CollapseProject` and `ExtractPythonUDFs` rules the UDF
is called ~1000 times for each row which degraded the performance of the query
significantly.
The `ExtractPythonUDFs` rule could deduplicate deterministic UDFs so as to
avoid performance degradation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]