Julian Hyde created CALCITE-4564:
------------------------------------
Summary: Initialization context for non-static user-defined
functions (UDFs)
Key: CALCITE-4564
URL: https://issues.apache.org/jira/browse/CALCITE-4564
Project: Calcite
Issue Type: Bug
Reporter: Julian Hyde
I propose to allow user-defined functions (UDFs) to read from an initialization
context during construction. The initialization context would be a new Java
{{interface UdfInitializer}} that provides, among other things, a type factory
and the values of the arguments to the function call whose values are literals.
The purpose of this feature is to allow functions to do more work at
initialization time and less work on each invocation. Suppose I wanted to write
a UDF {{regexMatch(pattern, string)}} that matches Java regular expressions. If
{{pattern}} is a literal, I would like to create an instance of the function
object that calls {{Pattern.compile(pattern)}} in its constructor and stores
the resulting {{Pattern}} object as a field. Each invocation of the function
can use that {{Pattern}} object, and does not have to pay the cost of
compilation.
In order to use this feature, a UDF class would have a public constructor with
a single argument that is a {{UdfInitializer}}. The method that invokes the
function, conventionally called {{eval}}, must be non-static.
This feature is optional. A UDF that has a public constructor with zero
arguments (which is the current contract for non-static UDFs) will continue to
work. [class
MyPlusFunction|https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/test/java/org/apache/calcite/util/Smalls.java#L429]
is an example of this kind of UDF.
This feature would apply to all UDFs, including table functions (i.e. those
whose argument are tables or which return tables) and aggregate functions.
The initialization context would not affect type derivation aspects of the
function. The return type, operand types, and so forth, will already have been
derived during validate time, and is complete well before any code is generated
or executed. If you want to control type derivation, you should create your own
sub-class of {{SqlOperator}}, as today.
There are some implementation challenges:
* The code generator will need to generate an instance of {{UdfInitializer}}
for each UDF call that occurs in the query. Some data structures that are
readily available at validate time (e.g. {{RexCall}}) are not easily re-created
at run time, so we should be conservative what information is available via
{{UdfInitializer}}.
* The code generator must ensure that those instances are constructed exactly
once during the execution of the query; those instances should not be variables
in the {{execute}} method, but should instead be fields, or perhaps static
fields, in the generated class.
* This functionality needs to work through both the interpreter ({{Bindable}}
convention) and generated code ({{Enumerable}} convention).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)