> At the moment as we are not exposing the execution engine primitives to Python user, are you expecting to expose them by this approach.
>From our side, these APIs are not directly exposed to the end user, but rather, primitives that allow us to build on top of. The end user would just do sth like: (not actual API, but give to give some idea) from data import some_storage table = data.some_storage(path) result = run_query(table) But we will use the data-source UDFs primitives to implement "some_storage" On Fri, Jun 3, 2022 at 1:53 PM Vibhatha Abeykoon <vibha...@gmail.com> wrote: > First of all, this is a nice discussion, but I have a doubt. > > I have a question regarding the simplicity of things. At the moment as we > are not exposing the execution engine primitives to Python user, are you > expecting to expose them by this approach? > > On Fri, Jun 3, 2022 at 9:02 PM Yaron Gvili <rt...@hotmail.com> wrote: > > > Hi, > > > > I'm working on support for data-source UDFs and would like to get > feedback > > about the design I have in mind for it. > > > > By support for data-source UDFs, at a basic level, I mean enabling a user > > to define using PyArrow APIs a record-batch-generating function > implemented > > in Python that would be easily plugged into a source-node in a > > streaming-engine execution plan. Such functions are similar to the > existing > > scalar UDFs with zero inputs, but an important difference is that scalar > > UDFs are plugged and composed in expressions whereas data-source UDFs > would > > be plugged into a source-node. > > > > Focusing on the Arrow and PyArrow parts (I'm leaving the Ibis and > > Ibis-Substrait parts out), the design I have in mind includes: > > > > * In Arrow: Adding a new source-UDF kind of arrow::compute::Function, > > for functions that generate data. Such functions would be registered in a > > FunctionRegistry but not used in scalar expressions nor composed. > > * In Arrow: Adding SourceUdfContext and SourceUdfOptions (similar to > > ScalarUdfContext and ScalarUdfOptions) in "cpp/src/arrow/python/udf.h". > > * In Arrow: Adding a UdfSourceExecNode into which a (source-UDF-kind > > of) function can be plugged. > > * In PyArrow: Following the design of scalar UDFs, and hopefully > > reusing much of it. > > > > Cheers, > > Yaron. > > > -- > Vibhatha Abeykoon >