paleolimbot commented on PR #13397:
URL: https://github.com/apache/arrow/pull/13397#issuecomment-1186071656

   Thank you everybody for reviews! There are a lot of threads here so I'm 
going to try to summarise which bits were unresolved and how I think I resolved 
them (or maybe not). The CI failures on r-devel are because of a recent commit 
to r-devel; the CI failure on Windows is a lintr error that I don't know how to 
fix (cyclomatic complexity of an R6 object?).
   
   - @pitrou - The CI failures I thought I saw as a result of your suggestions 
were actually something else, so all those improvements to safe-call-into-r.h 
made it here. Thank you!
   - @pitrou, @nealrichardson, @jorisvandenbossche I reverted the interface to 
match the Python interface more closely. While `register_scalar_function()` is 
perhaps not as intuitive as `register_user_defined_function()`, it sounds like 
there was some discussion about its name and I think it's best to defer to the 
result of that discussion (we can revisit the syntax if it does indeed prove to 
be confusing to R users). This included nixing the "advanced" API...all R 
functions are passed by the user as `function(context, x, y, ...) { ... }`. I 
did add an `auto_convert` argument: if `auto_convert = TRUE`, the arguments and 
output get converted so that the function only has to deal with R objects 
(which makes it substantially easier to write the function).
   - @pitrou I didn't add an R API to specify documentation. If there becomes a 
way that compute function documentation is accessible and useful in R, I think 
that is the right time to add that API to user-defined functions.
   - @nealrichardson I changed the user-facing example to the one you suggested 
(predict method of a linear model). For the tests I kept something simpler 
since there's a few variations that are needed to test all the cases.
   - @nealrichardson expressed concern that because we can't use user-defined 
functions when we return an R-level RecordBatchReader that it may be confusing 
when this fails. I think this only affects `to_duckb()` which is relatively 
new, and @westonpace indicated that this should be possible in the future 
(perhaps some details about that might help with this concern?).
   
   A concern I have that never came up is that there is probably more overhead 
than is needed for each call into R. We can probably get it down to a single 
unwind protect...I think that would be best handled in a follow-up PR with a 
narrow scope but I could in theory try here too.
   
   Example usage with the final API:
   
   <details>
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   some_model <- lm(mpg ~ disp + cyl, data = mtcars)
   register_scalar_function(
     "mtcars_predict_mpg",
     function(context, disp, cyl) {
       predict(some_model, newdata = data.frame(disp, cyl))
     },
     in_type = schema(disp = float64(), cyl = float64()),
     out_type = float64(),
     auto_convert = TRUE
   )
   
   as_arrow_table(mtcars) %>%
     transmute(mpg, mpg_predicted = mtcars_predict_mpg(disp, cyl)) %>%
     collect() %>%
     head()
   #>    mpg mpg_predicted
   #> 1 21.0      21.84395
   #> 2 21.0      21.84395
   #> 3 22.8      26.08886
   #> 4 21.4      19.82676
   #> 5 18.7      14.55267
   #> 6 18.1      20.50602
   ```
   
   <sup>Created on 2022-07-15 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to