Volodymyr Vysotskyi created DRILL-7337:
------------------------------------------
Summary: Add vararg UDFs support
Key: DRILL-7337
URL: https://issues.apache.org/jira/browse/DRILL-7337
Project: Apache Drill
Issue Type: Sub-task
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
Fix For: 1.17.0
The aim of this Jira is to add support for vararg UDFs to simplify UDFs
creation for the case when it is required to accept different numbers of
arguments.
h2. Requirements for vararg UDFs:
* It should be possible to register vararg UDFs with the same name, but with
different argument types;
* Only vararg UDFs with a single variable-length argument placed after all
other arguments should be allowed;
* Vararg UDF should have less priority than the regular one for the case when
they both are suitable;
* Besides simple functions, vararg support should be added to the aggregate
functions.
h2. Implementation details
The lifecycle of UDF is the following:
* UDF is validated in {{FunctionConverter}} class and for the case when there
is no problem (UDF has required fields with required types, required
annotations, etc.), it is converted to the {{DrillFuncHolder}} to be registered
in the function registry. Also, corresponding {{SqlFunction}} instances are
created based on {{DrillFuncHolder}} to be used in Calcite;
* When a query uses this UDF, Calcite validate that UDF with required name,
arguments number and arguments types (for Drill arguments types are not checked
at this stage) exists;
* After Calcite was able to find the required {{SqlFunction instance}}, it
uses Drill to find required {{DrillFuncHolder}}. All the work for determining
the most suitable function is done in {{FunctionResolver}} and in
{{TypeCastRules.getCost()}};
* At the execution stage, {{DrillFuncHolder}} found again using
{{FunctionCall}} instance;
* {{DrillFuncHolder}} is used for code generation.
Considering these steps, the first thing to be done for adding support for
vararg UDFs is updating logic in {{FunctionConverter}} to allow registering
vararg UDFs taking into account requirements declared above.
Calcite uses {{SqlOperandTypeChecker}} to verify arguments number, so Drill
should provide its own for vararg UDFs to be able to use them. To determine
whether UDF is vararg, new {{isVarArg}} property will be added to the
{{FunctionTemplate}}.
{{TypeCastRules.getCost()}} method should be updated to be able to find vararg
UDFs and prioritize regular UDFs.
Code generation logic should be updated to handle vararg UDFs. Generated code
for varag argument will look in the following way:
{code:java}
NullableVarCharHolder[] inputs = new NullableVarCharHolder[3];
inputs[0] = out14;
inputs[1] = out19;
inputs[2] = out24;
{code}
To create own varagr UDF, new {{isVarArg}} property should be set to {{true}}
in {{FunctionTemplate}}.
After that, required vararg input should be declared as an array.
Here is an example if vararg UDF:
{code:java}
@FunctionTemplate(name = "concat_varchar",
isVarArg = true,
scope = FunctionTemplate.FunctionScope.SIMPLE)
public class VarCharConcatFunction implements DrillSimpleFunc {
@Param *VarCharHolder[] inputs*;
@Output VarCharHolder out;
@Inject DrillBuf buffer;
@Override
public void setup() {
}
@Override
public void eval() {
int length = 0;
for (VarCharHolder input : inputs) {
length += input.end - input.start;
}
out.buffer = buffer = buffer.reallocIfNeeded(length);
out.start = out.end = 0;
for (VarCharHolder input : inputs) {
for (int id = input.start; id < input.end; id++) {
out.buffer.setByte(out.end++, input.buffer.getByte(id));
}
}
}
}
{code}
h2. Limitations connected with VarArg UDFs:
* Specified nulls handling in FunctionTemplate does not affect vararg
parameters, i.e. the user should add UDFs with non-nullable and nullable value
holder vararg fields;
* VarArg UDFs supports only values of the same type including nullability for
vararg arguments for value holder vararg fields. If vararg field is
FieldReader, all the responsibility for handling types and nullability of input
vararg fields is placed on the UDF implementation;
* The scalar replacement does not happen for vararg arguments;
* UDF implementation should consider the case when vararg field is empty.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)