nealrichardson commented on code in PR #41223:
URL: https://github.com/apache/arrow/pull/41223#discussion_r1577888964
##########
r/NEWS.md:
##########
@@ -19,6 +19,9 @@
# arrow 16.0.0.9000
+* R functions that users write that use functions that Arrow supports in
dataset queries now can be used in queries too. Previously, only functions that
used arithmetic operators worked. For example, `time_hours <- function(mins)
mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)`
did not; now both work. These are not true user-defined functions (UDFs); for
those, see `register_scalar_function()`. (#41223)
Review Comment:
> Late to looking at this but maybe the specifics around this can be
documented better in user-facing help pages. If there are "true UDFs" (which
these aren't), what would we call these?
🤷 I'm not sure there is an accepted term of art for this.
> Further, it looks like this doesn't pull the data into R, is my read
correct?
Correct. In this case, `time_hours_rounded(x)` will evaluate to something
like
```r
Expression$create("round",
list(
Expression$create("divide_checked",
list(Expression$fieldref("x"), Expression$scalar(60)))
)
)
```
There will be no named `time_hours_rounded` function that acero sees, and
data doesn't get pulled into R to compute it, it's all based on expressions
that are supported in acero.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]