Re: [QUESTION]: Caching UDFs

Paul Rogers Thu, 08 Aug 2019 09:56:25 -0700

Hi Charles,

In general, we cannot know if a function is deterministic. Your function might 
be rand(seed, max). It might do a JDBC lookup or a REST call. Drill can't know 
(unless we add some way to know that a function is deterministic: maybe a 
@Deterministic annotation.)


That said, you can build in caching inside the function. Should your cache be 
separate from mine for security reasons? Should the cache be shared across 
execution threads on a given node? Local to a single minor fragment?

Aggregates are example of functions that have internal state, perhaps the idea 
can be extended for a function-specific results cache.

Thanks,
- Paul

 

    On Thursday, August 8, 2019, 09:46:12 AM PDT, Charles Givre 
<[email protected]> wrote:  
 
 Hello Drill Devs,I have a question about UDFs.  Let's say you have a 
non-trivial UDF called foo(x,y) which returns some value.  Assuming that if the 
arguments are the same, the function foo() will return the same result, does 
Drill have any optimizations to prevent running the non-trivial function?  
I was thinking that it might make sense to cache the arguments and results in 
memory and before the function is executed, check the cache to see if they're 
there.  If they are, return the cached results, and if not, execute the 
function.  I was thinking that for some functions, like date/time functions, we 
might want to include something in the code to ensure that the results do not 
get cached. 
Thoughts?

Charles S. Givre CISSPData Scientist, Co-Founder GTK Cyber LLC
[email protected]: (443) 762-3286

Re: [QUESTION]: Caching UDFs

Reply via email to