jonkeane commented on issue #45098:
URL: https://github.com/apache/arrow/issues/45098#issuecomment-2564326346

   > My assumption here is that acero works by static analysis -- read the AST, 
apply known translations, i.e. analogous to {dbplyr}
   
   Yup, at a high level this is what's going on. You might have already found 
these, but here are some pointers around `case_when` that might be helpful: we 
[register the 
binding](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-funcs-conditional.R#L100-L101)
 which does some validation and then uses 
[`arrow_eval`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-eval.R#L18-L87)
 each of the case formulas. This `arrow_eval` is to create arrow expressions 
(you might not need it if you're not operating on expressions that reference 
columns in a data.frame(-like) object. Then the arrow expression itself is 
returned](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-funcs-conditional.R#L136-L146).
 If `fcase()` don't need the tidy evaluation semantics for selecting columns, 
the binding might be as simple as just using a similar `Expression$create()` on 
the expressions themselv
 es (there's probably a bit more to that, but much of the complication with the 
dplyr bindings is getting the tidy evaluation working).
   
   The[ test for 
`case_when`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/tests/testthat/test-dplyr-funcs-conditional.R#L178-L218)
 use a helper that is called 
[`compare_dplyr_binding`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/tests/testthat/helper-expectation.R#L70-L83),
 which in this case is a little ill-named, but should work, so long as the 
input data.frame is `.input` in the test code.
   
   All of that said, because we are looking through the AST to find bindings, 
if `fcase` had an AST that aws expressions that already had bindings (either 
base R or even a remap to `case_when`) in this circumstance, it _should_ work 
with arrow without even needing any changes in the arrow package itself. 
   
   
   Something like the following _should_ work:
   
   ```
   fcase_arrow <- function(..., default = NA) {
     dots <- rlang::list2(...)
     # for each pair of arguments, make a formula expression
     formulae <- lapply(seq(1, length(dots), by = 2), function(i) 
rlang::expr(dots[[!!i]] ~ dots[[!!(i + 1)]]))
     dplyr::case_when(!!!formulae, .default = default)
   }
   ```
   
   You could build up the acero with `Expression$create("case_when", ...)` 
expressions rather than using `rlang`'s expression and dots management there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to