Neal Richardson created ARROW-17437: ---------------------------------------
Summary: [R][C++] Scalar UDFs don't actually deal with scalars Key: ARROW-17437 URL: https://issues.apache.org/jira/browse/ARROW-17437 Project: Apache Arrow Issue Type: Bug Components: C++, R Reporter: Neal Richardson Noted while testing out UDFs in R. I was wrapping a {{system()}} call in a UDF to shell out and capture the stdout for each value in the data, but I ended up getting the same result for all rows. After some exploration, I figured out that the problem was that the data going into the UDF is actually a vector, so unless the R UDF function is properly vectorized, you'll get unexpected data. Here's an example that illustrates: {code} register_scalar_function( "test", function(context, x) paste(x, collapse=","), utf8(), utf8(), auto_convert=TRUE ) Table$create(x = c("a", "b", "c")) |> transmute(test(x)) |> collect() # # A tibble: 3 × 1 # `test(x)` # <chr> # 1 a,b,c # 2 a,b,c # 3 a,b,c {code} Basically, the UDF gets the chunk of data and evaluates to return a Scalar, which gets recycled for all rows. -- This message was sent by Atlassian Jira (v8.20.10#820010)