Hi Charles,
In general, we cannot know if a function is deterministic. Your function might
be rand(seed, max). It might do a JDBC lookup or a REST call. Drill can't know
(unless we add some way to know that a function is deterministic: maybe a
@Deterministic annotation.)
That said, you can build in caching inside the function. Should your cache be
separate from mine for security reasons? Should the cache be shared across
execution threads on a given node? Local to a single minor fragment?
Aggregates are example of functions that have internal state, perhaps the idea
can be extended for a function-specific results cache.
Thanks,
- Paul
On Thursday, August 8, 2019, 09:46:12 AM PDT, Charles Givre
<[email protected]> wrote:
Hello Drill Devs,I have a question about UDFs. Let's say you have a
non-trivial UDF called foo(x,y) which returns some value. Assuming that if the
arguments are the same, the function foo() will return the same result, does
Drill have any optimizations to prevent running the non-trivial function?
I was thinking that it might make sense to cache the arguments and results in
memory and before the function is executed, check the cache to see if they're
there. If they are, return the cached results, and if not, execute the
function. I was thinking that for some functions, like date/time functions, we
might want to include something in the code to ensure that the results do not
get cached.
Thoughts?
Charles S. Givre CISSPData Scientist, Co-Founder GTK Cyber LLC
[email protected]: (443) 762-3286