[ https://issues.apache.org/jira/browse/DRILL-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Volodymyr Vysotskyi updated DRILL-7337: --------------------------------------- Labels: doc-impacting ready-to-commit (was: doc-impacting) > Add vararg UDFs support > ----------------------- > > Key: DRILL-7337 > URL: https://issues.apache.org/jira/browse/DRILL-7337 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.16.0 > Reporter: Volodymyr Vysotskyi > Assignee: Volodymyr Vysotskyi > Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > The aim of this Jira is to add support for vararg UDFs to simplify UDFs > creation for the case when it is required to accept different numbers of > arguments. > h2. Requirements for vararg UDFs: > * It should be possible to register vararg UDFs with the same name, but with > different argument types; > * Only vararg UDFs with a single variable-length argument placed after all > other arguments should be allowed; > * Vararg UDF should have less priority than the regular one for the case > when they both are suitable; > * Besides simple functions, vararg support should be added to the aggregate > functions. > h2. Implementation details > The lifecycle of UDF is the following: > * UDF is validated in {{FunctionConverter}} class and for the case when > there is no problem (UDF has required fields with required types, required > annotations, etc.), it is converted to the {{DrillFuncHolder}} to be > registered in the function registry. Also, corresponding {{SqlFunction}} > instances are created based on {{DrillFuncHolder}} to be used in Calcite; > * When a query uses this UDF, Calcite validate that UDF with required name, > arguments number and arguments types (for Drill arguments types are not > checked at this stage) exists; > * After Calcite was able to find the required {{SqlFunction instance}}, it > uses Drill to find required {{DrillFuncHolder}}. All the work for determining > the most suitable function is done in {{FunctionResolver}} and in > {{TypeCastRules.getCost()}}; > * At the execution stage, {{DrillFuncHolder}} found again using > {{FunctionCall}} instance; > * {{DrillFuncHolder}} is used for code generation. > Considering these steps, the first thing to be done for adding support for > vararg UDFs is updating logic in {{FunctionConverter}} to allow registering > vararg UDFs taking into account requirements declared above. > Calcite uses {{SqlOperandTypeChecker}} to verify arguments number, so Drill > should provide its own for vararg UDFs to be able to use them. To determine > whether UDF is vararg, new {{isVarArg}} property will be added to the > {{FunctionTemplate}}. > {{TypeCastRules.getCost()}} method should be updated to be able to find > vararg UDFs and prioritize regular UDFs. > Code generation logic should be updated to handle vararg UDFs. Generated code > for varag argument will look in the following way: > {code:java} > NullableVarCharHolder[] inputs = new > NullableVarCharHolder[3]; > inputs[0] = out14; > inputs[1] = out19; > inputs[2] = out24; > {code} > To create own varagr UDF, new {{isVarArg}} property should be set to {{true}} > in {{FunctionTemplate}}. > After that, required vararg input should be declared as an array. > Here is an example if vararg UDF: > {code:java} > @FunctionTemplate(name = "concat_varchar", > isVarArg = true, > scope = FunctionTemplate.FunctionScope.SIMPLE) > public class VarCharConcatFunction implements DrillSimpleFunc { > @Param *VarCharHolder[] inputs*; > @Output VarCharHolder out; > @Inject DrillBuf buffer; > > @Override > public void setup() { > } > @Override > public void eval() { > int length = 0; > for (VarCharHolder input : inputs) { > length += input.end - input.start; > } > out.buffer = buffer = buffer.reallocIfNeeded(length); > out.start = out.end = 0; > for (VarCharHolder input : inputs) { > for (int id = input.start; id < input.end; id++) { > out.buffer.setByte(out.end++, input.buffer.getByte(id)); > } > } > } > } > {code} > h2. Limitations connected with VarArg UDFs: > * Specified nulls handling in FunctionTemplate does not affect vararg > parameters, i.e. the user should add UDFs with non-nullable and nullable > value holder vararg fields; > * VarArg UDFs supports only values of the same type including nullability > for vararg arguments for value holder vararg fields. If vararg field is > FieldReader, all the responsibility for handling types and nullability of > input vararg fields is placed on the UDF implementation; > * The scalar replacement does not happen for vararg arguments; > * UDF implementation should consider the case when vararg field is empty. > *For documentation* > New functions: collect_to_list, TBA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)