paleolimbot commented on code in PR #13397: URL: https://github.com/apache/arrow/pull/13397#discussion_r925926179
########## r/R/compute.R: ########## @@ -306,3 +306,145 @@ cast_options <- function(safe = TRUE, ...) { ) modifyList(opts, list(...)) } + +#' Register user-defined functions +#' +#' These functions support calling R code from query engine execution +#' (i.e., a [dplyr::mutate()] or [dplyr::filter()] on a [Table] or [Dataset]). +#' Use [register_scalar_function()] attach Arrow input and output types to an +#' R function and make it available for use in the dplyr interface and/or +#' [call_function()]. Scalar functions are currently the only type of +#' user-defined function supported. In Arrow, scalar functions must be +#' stateless and return output with the same shape (i.e., the same number +#' of rows) as the input. +#' +#' @param name The function name to be used in the dplyr bindings +#' @param in_type A [DataType] of the input type or a [schema()] +#' for functions with more than one argument. This signature will be used +#' to determine if this function is appropriate for a given set of arguments. +#' If this function is appropriate for more than one signature, pass a +#' `list()` of the above. +#' @param out_type A [DataType] of the output type or a function accepting +#' a single argument (`types`), which is a `list()` of [DataType]s. If a +#' function it must return a [DataType]. +#' @param fun An R function or rlang-style lambda expression. The function +#' will be called with a first argument `context` which is a `list()` +#' with elements `batch_size` (the expected length of the output) and +#' `output_type` (the required [DataType] of the output). Subsequent +#' arguments are passed by position as specified by `in_types`. If +#' `auto_convert` is `TRUE`, subsequent arguments are converted to +#' R vectors before being passed to `fun` and the output is automatically +#' constructed with the expected output type via [as_arrow_array()]. +#' @param auto_convert Use `TRUE` to convert inputs before passing to `fun` Review Comment: I envision it being a lot more common to use `auto_convert = TRUE` and went back and forth on the default value a few times. I went with this because (1) it's what the Python bindings do and (2) forcing a user to "opt-in" to the auto-convert behaviour at least clues them in that there's something magical going on, even if they don't understand exactly what it is. I don't really have strong feelings about this, I guess `FALSE` just seemed like a safer default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org