Hi Felipe, Vectorization will be applied whenever possible. When all input and output types of a function are primitive (int16, int32, int64, float32, float64) and do not involve any Option or Result, the macro will automatically generate code based on unary <https://docs.rs/arrow/latest/arrow/compute/fn.unary.html> or binary <https://docs.rs/arrow/latest/arrow/compute/fn.binary.html> kernels, which potentially allows for vectorization.
Both examples you showed are not vectorized. The `div` function is due to the Result output, while `gcd` is due to the loop in its implementation. However, if the function is simple enough, like an `add` function: #[function("add(int, int) -> int")] fn add(a: i32, b: i32) -> i32 { a + b } It can be auto-vectorized by llvm. Runji On 2024/06/28 17:13:16 Felipe Oliveira Carvalho wrote: > On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb <al...@influxdata.com> wrote: > > > > Hi Xuanwo, > > > > Sorry for the delay in responding. I think the ability to easily write > > functions that "feel" like native functions in whatever language and be > > able to generate arrow / vectorized versions of them is quite valuable. > > This is my understanding of what this proposal is about. > > My understanding is that it's not vectorized. From the examples in > risingwavelabs/arrow-udf, <https://github.com/risingwavelabs/arrow-udf> it > looks like the macros generate code that gathers values from columns into > local scalars that are passed as scalar parameters to user functions. Is > the hope here that rustc/llvm will auto-vectorize the code? > > #[function("gcd(int, int) -> int")] > fn gcd(mut a: i32, mut b: i32) -> i32 { > while b != 0 { > (a, b) = (b, a % b); > } > a > } > > #[function("div(int, int) -> int")] > fn div(x: i32, y: i32) -> Result<i32, &'static str> { > if y == 0 { > return Err("division by zero"); > } > Ok(x / y) > } > > > I left some additional comments on the markdown. > > > > One thing that might be worth doing is articulate some other potential > > locations for where the code might go. One option, as I think you propose, > > is to make its own repository. Another option could be to donate the code > > and put the various language bindings in the same repo as the arrow > > language implementations (e.g arrow-rs, arrow for python, etc) which would > > likely make it easier to maintain and discover. > > > > I am curious about what other devs / users feel about this? > > > > Andrew > > > > > > > > On Thu, Jun 20, 2024 at 3:04 AM Xuanwo <xu...@apache.org> wrote: > > > > > Hello, everyone. > > > > > > I start this thread to disscuss the donation of a User-Defined Function > > > Framework for Apache Arrow. > > > > > > Feel free to review and leave your comments here. For live review, > please > > > visit: > > > > > > https://hackmd.io/@xuanwo/apache-arrow-udf > > > > > > The original content also pasted here for a quick reading: > > > > > > ------ > > > > > > ## Abstract > > > > > > Arrow UDF is a User-Defined Function Framework for Apache Arrow. > > > > > > ## Proposal > > > > > > Arrow UDF allows user to easily create and run user-defined functions > > > (UDF) in Rust, Python, Java or JavaScript based on Apache Arrow. The > > > functions can be executed natively, or in WebAssembly, or in a remote > > > server via Arrow Flight. > > > > > > Arrow UDF was originally designed to be used by the RisingWave project > but > > > is now being used by Databend and several database startups. > > > > > > We believe that the Arrow UDF project will provide diversity value to > the > > > entire Arrow community. > > > > > > ## Background > > > > > > Arrow UDF is being developed by an open-source community from day one > and > > > is owned by RisingWaveLabs. The project has been launched in December > 2023. > > > > > > ## Initial Goals > > > > > > By transferring ownership of the project to the Apache Arrow, Arrow UDF > > > expects to ensure its neutrality and further encourage and facilitate > the > > > adoption of Arrow UDF by the community. > > > > > > ## Current Status > > > > > > Contributors: 5 > > > > > > Users: > > > > > > - [RisingWave]: A Distributed SQL Database for Stream Processing. > > > - [Databend]: An open-source cloud data warehouse that serves as a > > > cost-effective alternative to Snowflake. > > > > > > ## Documentation > > > > > > The document of Arrow UDF is hosted at > > > https://docs.rs/arrow-udf/latest/arrow_udf/. > > > > > > ## Initial Source > > > > > > The project currently holds a GitHub repository and multiple packages: > > > > > > - https://github.com/risingwavelabs/arrow-udf > > > > > > Rust: > > > > > > - https://crates.io/arrow-udf/ > > > - https://crates.io/arrow-udf-python/ > > > - https://crates.io/arrow-udf-js/ > > > - https://crates.io/arrow-udf-js-deno/ > > > - https://crates.io/arrow-udf-wasm/ > > > > > > Python: > > > > > > - https://pypi.org/project/arrow-udf/ > > > > > > Those packge will retain its name, while the repository will be moved to > > > apache org. > > > > > > ## Required Resources > > > > > > ### Mailing Lists > > > > > > We can reuse the existing mailing lists that arrow have. > > > > > > ### Git Repositories > > > > > > From > > > > > > - https://github.com/risingwavelabs/arrow-udf > > > > > > To > > > > > > - https://gitbox.apache.org/asf/repos/arrow-udf > > > - https://github.com/apache/arrow-udf > > > > > > ### Issue Tracking > > > > > > The project would like to continue using GitHub Issues. > > > > > > ### Other Resources > > > > > > The project has already chosen GitHub actions as continuous integration > > > tools. > > > > > > ## Initial Committers > > > > > > - Runji Wang wangrunji0...@163.com > > > - Giovanny Gutiérrez > > > - sundy-li sund...@apache.org > > > - Xuanwo xua...@apache.org > > > - Max Justus Spransy maxjus...@gmail.com > > > > > > [RisingWave]: https://github.com/risingwavelabs/risingwave > > > [Databend]: https://github.com/datafuselabs/databend > > > > > > Xuanwo > > > >