jonkeane commented on issue #45098:
URL: https://github.com/apache/arrow/issues/45098#issuecomment-2564326346
> My assumption here is that acero works by static analysis -- read the AST,
apply known translations, i.e. analogous to {dbplyr}
Yup, at a high level this is what's going on. You might have already found
these, but here are some pointers around `case_when` that might be helpful: we
[register the
binding](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-funcs-conditional.R#L100-L101)
which does some validation and then uses
[`arrow_eval`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-eval.R#L18-L87)
each of the case formulas. This `arrow_eval` is to create arrow expressions
(you might not need it if you're not operating on expressions that reference
columns in a data.frame(-like) object. Then the arrow expression itself is
returned](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/R/dplyr-funcs-conditional.R#L136-L146).
If `fcase()` don't need the tidy evaluation semantics for selecting columns,
the binding might be as simple as just using a similar `Expression$create()` on
the expressions themselv
es (there's probably a bit more to that, but much of the complication with the
dplyr bindings is getting the tidy evaluation working).
The[ test for
`case_when`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/tests/testthat/test-dplyr-funcs-conditional.R#L178-L218)
use a helper that is called
[`compare_dplyr_binding`](https://github.com/apache/arrow/blob/352b710f337c52485b6e50ba678a94a13c0e4113/r/tests/testthat/helper-expectation.R#L70-L83),
which in this case is a little ill-named, but should work, so long as the
input data.frame is `.input` in the test code.
All of that said, because we are looking through the AST to find bindings,
if `fcase` had an AST that aws expressions that already had bindings (either
base R or even a remap to `case_when`) in this circumstance, it _should_ work
with arrow without even needing any changes in the arrow package itself.
Something like the following _should_ work:
```
fcase_arrow <- function(..., default = NA) {
dots <- rlang::list2(...)
# for each pair of arguments, make a formula expression
formulae <- lapply(seq(1, length(dots), by = 2), function(i)
rlang::expr(dots[[!!i]] ~ dots[[!!(i + 1)]]))
dplyr::case_when(!!!formulae, .default = default)
}
```
You could build up the acero with `Expression$create("case_when", ...)`
expressions rather than using `rlang`'s expression and dots management there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]