Sounds like a fantastic idea, and WASM seems a natural choice

You get the ability to opt into IO if you want/need to, with WASI, but by
default
you can rest assured about worst-case consequences being contained.

On Mon, Apr 25, 2022 at 4:20 PM Wes McKinney <wesmck...@gmail.com> wrote:

> I was going to reply to this e-mail thread on user@ but thought I
> would start a new thread on dev@.
>
> Executing user-defined functions in memory, especially untrusted
> functions, in general is unsafe. For "trusted" functions, having an
> in-memory API for writing them in user languages is very useful. I
> remember tinkering with adding UDFs in Impala with LLVM IR, which
> would allow UDFs to have performance consistent with built-ins
> (because built-in functions are all inlined into code-generated
> expressions), but segfaults would bring down the server, so only
> admins could be trusted to add new UDFs.
>
> However, I wonder if we should eventually define an "external UDF"
> protocol and an example UDF "harness", using Flight to do RPC across
> the process boundaries. So the idea is that an external local UDF
> Flight execution service is spun up, and then data is sent to the UDF
> in a DoExchange call.
>
> As Jacques pointed out in an interview 1], a compelling solution to
> the UDF sandboxing problem is WASM. This allows "untrusted" WASM
> functions to be run safely in-process. However, we would need to
> harden and document the details of the interface between the host
> language and the user WASM code.
>
> Since there are many different potential kinds of user-defined
> functions aside from scalar functions, that increases the complexity /
> scope of specification work here also.
>
> - Wes
>
> [1]:
> https://reneeshah.medium.com/how-webassembly-gets-used-the-18-most-exciting-startups-building-with-wasm-939474e951db
>
> On Fri, Apr 22, 2022 at 2:09 PM David Li <lidav...@apache.org> wrote:
> >
> > This is currently being implemented for Python:
> https://github.com/apache/arrow/pull/12590 It may not land for 8.0.0 but
> should be there for 9.0.0, presumably.
> >
> > It is already possible in C++. The same APIs that built-in functions use
> to register themselves should be available to applications and there's a
> fairly trivial example of this in [1]. Such a function would also be
> available from Python/R/etc. if you could figure out how to
> package/distribute/load the application library appropriately.
> >
> > [1]:
> https://github.com/apache/arrow/blob/e1e782a4542817e8a6139d6d5e022b56abdbc81d/cpp/examples/arrow/compute_register_example.cc
> >
> > On Fri, Apr 22, 2022, at 15:04, Wenlei Xie wrote:
> >
> > Hi,
> >
> > I am wondering if I can define my own Arrow Compute function and use it,
> say in PyArrow? It looks like Compute Function has a FuntionRegistry, but I
> didn't find documentation about how to write your own Arrow Compute
> function (but maybe just didn't find the right place)
> >
> > Thank you so much!
> >
> > --
> > Best Regards,
> > Wenlei Xie
> >
> > Email: wenlei....@gmail.com
> >
> >
>

Reply via email to