Sounds like a fantastic idea, and WASM seems a natural choice You get the ability to opt into IO if you want/need to, with WASI, but by default you can rest assured about worst-case consequences being contained.
On Mon, Apr 25, 2022 at 4:20 PM Wes McKinney <wesmck...@gmail.com> wrote: > I was going to reply to this e-mail thread on user@ but thought I > would start a new thread on dev@. > > Executing user-defined functions in memory, especially untrusted > functions, in general is unsafe. For "trusted" functions, having an > in-memory API for writing them in user languages is very useful. I > remember tinkering with adding UDFs in Impala with LLVM IR, which > would allow UDFs to have performance consistent with built-ins > (because built-in functions are all inlined into code-generated > expressions), but segfaults would bring down the server, so only > admins could be trusted to add new UDFs. > > However, I wonder if we should eventually define an "external UDF" > protocol and an example UDF "harness", using Flight to do RPC across > the process boundaries. So the idea is that an external local UDF > Flight execution service is spun up, and then data is sent to the UDF > in a DoExchange call. > > As Jacques pointed out in an interview 1], a compelling solution to > the UDF sandboxing problem is WASM. This allows "untrusted" WASM > functions to be run safely in-process. However, we would need to > harden and document the details of the interface between the host > language and the user WASM code. > > Since there are many different potential kinds of user-defined > functions aside from scalar functions, that increases the complexity / > scope of specification work here also. > > - Wes > > [1]: > https://reneeshah.medium.com/how-webassembly-gets-used-the-18-most-exciting-startups-building-with-wasm-939474e951db > > On Fri, Apr 22, 2022 at 2:09 PM David Li <lidav...@apache.org> wrote: > > > > This is currently being implemented for Python: > https://github.com/apache/arrow/pull/12590 It may not land for 8.0.0 but > should be there for 9.0.0, presumably. > > > > It is already possible in C++. The same APIs that built-in functions use > to register themselves should be available to applications and there's a > fairly trivial example of this in [1]. Such a function would also be > available from Python/R/etc. if you could figure out how to > package/distribute/load the application library appropriately. > > > > [1]: > https://github.com/apache/arrow/blob/e1e782a4542817e8a6139d6d5e022b56abdbc81d/cpp/examples/arrow/compute_register_example.cc > > > > On Fri, Apr 22, 2022, at 15:04, Wenlei Xie wrote: > > > > Hi, > > > > I am wondering if I can define my own Arrow Compute function and use it, > say in PyArrow? It looks like Compute Function has a FuntionRegistry, but I > didn't find documentation about how to write your own Arrow Compute > function (but maybe just didn't find the right place) > > > > Thank you so much! > > > > -- > > Best Regards, > > Wenlei Xie > > > > Email: wenlei....@gmail.com > > > > >