I think this is a good proposal and I support its implementation, for
whatever that is worth

On Sun, Aug 23, 2020 at 12:17 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> I came to a limitation that I would like to propose a resolution to.
>
> TL;DR; currently, users plan UDFs calls via a call of the form
>
> let e = scalar_functions(“my_udf”, vec![col(“a”)],DataType::Float64)]);
> df.select(vec![e])
>
> The proposal is to use instead:
>
> let f = df.registry();
>
> let e = f.udf(“my_udf”, vec![col(“a”)])?;
>
> # note: no DataType::Float64
>
> df.select(vec![e])
>
> so that users do not have to know the return type of the udf they are using
> (they still need to set it during registration). This will make our lives
> easier, and will also enable our own UDFs (e.g. sqrt) to support variable
> types (e.g. float32 and float64). This will be important for functions that
> return composite objects, such as array(), whose return type heavily
> depends on its input type.
>
> Proposal:
>
> https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing
>
> Issue: https://issues.apache.org/jira/browse/ARROW-9836
> PR: https://github.com/apache/arrow/pull/8032
>
> Best,
> Jorge
>

Reply via email to