Dewey Dunnington created ARROW-17148:
----------------------------------------
Summary: [R] Improve evaluation of R functions from C++
Key: ARROW-17148
URL: https://issues.apache.org/jira/browse/ARROW-17148
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Dewey Dunnington
There are currently a few places where we call R code from C++ (and after
ARROW-16444 and ARROW-16703 we will have some more where the overhead of
calling into R might be greater than the time it takes to actually evaluate the
function/the functions will be called in a tight loop).
The current approach uses {{cpp11::function}}. This is totally fine and safe
but generates some ugly backtraces on error and is potentially slower than the
lean-and-mean approach of purrr (whose entire job is to call R functions in a
loop and has been heavily optimized). The purrr approach is to construct the
{{call()}} and calling environment in advance and then just run `Rf_eval(call,
env)` in the loop. This is both faster (fewer R API calls) and generates better
backtraces (e.g., {{Error in fun(arg1, arg2)}} instead of {{Error in
(function(a, b) { ...the whole content of the function ... })(every, deparsed,
argument)}}.
Before optimizing that heavily we should of course benchmark to see exactly how
much that matters!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)