nealrichardson commented on code in PR #41223:
URL: https://github.com/apache/arrow/pull/41223#discussion_r1577888964


##########
r/NEWS.md:
##########
@@ -19,6 +19,9 @@
 
 # arrow 16.0.0.9000
 
+* R functions that users write that use functions that Arrow supports in 
dataset queries now can be used in queries too. Previously, only functions that 
used arithmetic operators worked. For example, `time_hours <- function(mins) 
mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` 
did not; now both work. These are not true user-defined functions (UDFs); for 
those, see `register_scalar_function()`. (#41223)

Review Comment:
   > Late to looking at this but maybe the specifics around this can be 
documented better in user-facing help pages. If there are "true UDFs" (which 
these aren't), what would we call these? 
   
   🤷 I'm not sure there is an accepted term of art for this.
   
   > Further, it looks like this doesn't pull the data into R, is my read 
correct?
   
   Correct. In this case, `time_hours_rounded(x)` will evaluate to something 
like
   
   ```r
   Expression$create("round",
     list(
       Expression$create("divide_checked", 
         list(Expression$fieldref("x"), Expression$scalar(60)))
     )
   )
   ```
   
   There will be no named `time_hours_rounded` function that acero sees, and 
data doesn't get pulled into R to compute it, it's all based on expressions 
that are supported in acero.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to