[ 
https://issues.apache.org/jira/browse/DRILL-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7337:
---------------------------------------
    Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Add vararg UDFs support
> -----------------------
>
>                 Key: DRILL-7337
>                 URL: https://issues.apache.org/jira/browse/DRILL-7337
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.16.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.17.0
>
>
> The aim of this Jira is to add support for vararg UDFs to simplify UDFs 
> creation for the case when it is required to accept different numbers of 
> arguments.
> h2. Requirements for vararg UDFs:
>  * It should be possible to register vararg UDFs with the same name, but with 
> different argument types;
>  * Only vararg UDFs with a single variable-length argument placed after all 
> other arguments should be allowed;
>  * Vararg UDF should have less priority than the regular one for the case 
> when they both are suitable;
>  * Besides simple functions, vararg support should be added to the aggregate 
> functions.
> h2. Implementation details
> The lifecycle of UDF is the following:
>  * UDF is validated in {{FunctionConverter}} class and for the case when 
> there is no problem (UDF has required fields with required types, required 
> annotations, etc.), it is converted to the {{DrillFuncHolder}} to be 
> registered in the function registry. Also, corresponding {{SqlFunction}} 
> instances are created based on {{DrillFuncHolder}} to be used in Calcite;
>  * When a query uses this UDF, Calcite validate that UDF with required name, 
> arguments number and arguments types (for Drill arguments types are not 
> checked at this stage) exists;
>  * After Calcite was able to find the required {{SqlFunction instance}}, it 
> uses Drill to find required {{DrillFuncHolder}}. All the work for determining 
> the most suitable function is done in {{FunctionResolver}} and in 
> {{TypeCastRules.getCost()}};
>  * At the execution stage, {{DrillFuncHolder}} found again using 
> {{FunctionCall}} instance;
>  * {{DrillFuncHolder}} is used for code generation.
> Considering these steps, the first thing to be done for adding support for 
> vararg UDFs is updating logic in {{FunctionConverter}} to allow registering 
> vararg UDFs taking into account requirements declared above.
> Calcite uses {{SqlOperandTypeChecker}} to verify arguments number, so Drill 
> should provide its own for vararg UDFs to be able to use them. To determine 
> whether UDF is vararg, new {{isVarArg}} property will be added to the 
> {{FunctionTemplate}}.
> {{TypeCastRules.getCost()}} method should be updated to be able to find 
> vararg UDFs and prioritize regular UDFs.
> Code generation logic should be updated to handle vararg UDFs. Generated code 
> for varag argument will look in the following way:
> {code:java}
>                   NullableVarCharHolder[] inputs = new 
> NullableVarCharHolder[3];
>                   inputs[0] = out14;
>                   inputs[1] = out19;
>                   inputs[2] = out24;
> {code}
> To create own varagr UDF, new {{isVarArg}} property should be set to {{true}} 
> in {{FunctionTemplate}}.
>  After that, required vararg input should be declared as an array.
> Here is an example if vararg UDF:
> {code:java}
>   @FunctionTemplate(name = "concat_varchar",
>                     isVarArg = true,
>                     scope = FunctionTemplate.FunctionScope.SIMPLE)
>   public class VarCharConcatFunction implements DrillSimpleFunc {
>     @Param *VarCharHolder[] inputs*;
>     @Output VarCharHolder out;
>     @Inject DrillBuf buffer;
>  
>      @Override
>     public void setup() {
>     }
>      @Override
>     public void eval() {
>       int length = 0;
>       for (VarCharHolder input : inputs) {
>         length += input.end - input.start;
>       }
>        out.buffer = buffer = buffer.reallocIfNeeded(length);
>       out.start = out.end = 0;
>        for (VarCharHolder input : inputs) {
>         for (int id = input.start; id < input.end; id++) {
>           out.buffer.setByte(out.end++, input.buffer.getByte(id));
>         }
>       }
>     }
>   }
> {code}
> h2. Limitations connected with VarArg UDFs:
>  * Specified nulls handling in FunctionTemplate does not affect vararg 
> parameters, i.e. the user should add UDFs with non-nullable and nullable 
> value holder vararg fields;
>  * VarArg UDFs supports only values of the same type including nullability 
> for vararg arguments for value holder vararg fields. If vararg field is 
> FieldReader, all the responsibility for handling types and nullability of 
> input vararg fields is placed on the UDF implementation;
>  * The scalar replacement does not happen for vararg arguments;
>  * UDF implementation should consider the case when vararg field is empty.
> *For documentation*
> New functions: collect_to_list, TBA.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to