Sunitha Kambhampati created SPARK-27692:
-------------------------------------------
Summary: Optimize evaluation of udf that is deterministic and has
literal inputs
Key: SPARK-27692
URL: https://issues.apache.org/jira/browse/SPARK-27692
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Sunitha Kambhampati
Deterministic UDF is a udf for which the following is true: Given a specific
input, the output of the udf will be the same no matter how many times you
execute the udf.
When your inputs to the UDF are all literal and UDF is deterministic, we can
optimize this to evaluate the udf once and use the output instead of evaluating
the UDF each time for every row in the query.
This is valid only if the UDF is deterministic and inputs are literal.
Otherwise we should not and cannot apply this optimization.
*Testing:*
We have used this internally and have seen significant performance improvements
for some very expensive UDFs ( as expected).
In the PR, I have added unit tests.
*Credits:*
Thanks to Guy Khazma([https://github.com/guykhazma]) from the IBM Haifa
Research Team for the idea and the original implementation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]