Sunitha Kambhampati created SPARK-27692:
-------------------------------------------

             Summary: Optimize evaluation of udf that is deterministic and has 
literal inputs
                 Key: SPARK-27692
                 URL: https://issues.apache.org/jira/browse/SPARK-27692
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Sunitha Kambhampati


Deterministic UDF is a udf for which the following is true:  Given a specific 
input, the output of the udf will be the same no matter how many times you 
execute the udf.

When your inputs to the UDF are all literal and UDF is deterministic, we can 
optimize this to evaluate the udf once and use the output instead of evaluating 
the UDF each time for every row in the query. 

This is valid only if the UDF is deterministic and inputs are literal.  
Otherwise we should not and cannot apply this optimization. 

*Testing:* 

We have used this internally and have seen significant performance improvements 
for some very expensive UDFs ( as expected).

In the PR, I have added unit tests. 

*Credits:* 

Thanks to Guy Khazma([https://github.com/guykhazma]) from the IBM Haifa 
Research Team for the idea and the original implementation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to