I think this is a good proposal and I support its implementation, for whatever that is worth
On Sun, Aug 23, 2020 at 12:17 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > I came to a limitation that I would like to propose a resolution to. > > TL;DR; currently, users plan UDFs calls via a call of the form > > let e = scalar_functions(“my_udf”, vec![col(“a”)],DataType::Float64)]); > df.select(vec![e]) > > The proposal is to use instead: > > let f = df.registry(); > > let e = f.udf(“my_udf”, vec![col(“a”)])?; > > # note: no DataType::Float64 > > df.select(vec![e]) > > so that users do not have to know the return type of the udf they are using > (they still need to set it during registration). This will make our lives > easier, and will also enable our own UDFs (e.g. sqrt) to support variable > types (e.g. float32 and float64). This will be important for functions that > return composite objects, such as array(), whose return type heavily > depends on its input type. > > Proposal: > > https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing > > Issue: https://issues.apache.org/jira/browse/ARROW-9836 > PR: https://github.com/apache/arrow/pull/8032 > > Best, > Jorge >