dragosmg opened a new pull request, #13513:
URL: https://github.com/apache/arrow/pull/13513
Same goal as #13160, namely to allow the use of namespacing with bindings:
``` r
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
test_df <- tibble(
date = as.Date(c("2022-03-22", "2021-07-30", NA))
)
test_df %>%
mutate(ddate = lubridate::as_datetime(date)) %>%
collect()
#> # A tibble: 3 × 2
#> date ddate
#> <date> <dttm>
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA NA
test_df %>%
arrow_table() %>%
mutate(ddate = lubridate::as_datetime(date)) %>%
collect()
#> # A tibble: 3 × 2
#> date ddate
#> <date> <dttm>
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA NA
```
<sup>Created on 2022-05-14 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
The approach (option 2):
* each binding is registered only once, as `pkg::fun()` and we change the
way we look up a binding
Steps:
- [x] add functionality to allow binding registration with the `pkg::fun()`
name;
- [x] `register_binding()` registers a single, prefixed copy of `fun`,
`pkg::fun`.
- [x] Add a binding for the `::` operator, which helps with retrieving
bindings from the function registry.
- [x] Add generic unit tests for the `pkg::fun` functionality.
- [x] Throw a classed error (`"arrow-binding-error"`) if `fun` is not
found and look it up again, this time as `::fun`. All of:
- [x] `mutate()`
- [ ] `filter()`, and
- [ ] `summarise()` should able to handle this new error class
- [ ] register `nse_funcs` requiring indirect mapping
- [ ] register each binding with the corresponding `pkg::` prefix.
- [ ] add / update unit tests for the `nse_funcs` bindings to include at
least one `pkg::fun()` call for each binding
- [ ] register `nse_funcs` requiring direct mapping (unary and binary
bindings)
- [ ] register unary bindings
- [ ] register binary bindings
- [ ] add / update unit tests for the `nse_funcs` bindings to include at
least one `pkg::fun()` call for each binding
- [ ] document changes in the Writing bindings documentation
- [ ] going forward we should be using `pkg::fun` when defining a
binding.
Bindings that will not be registered with a `pkg::` prefix:
* type casting, such as `cast()` or `dictionary_encode()`
* operators (e.g. `"!"`, `"=="`, `"!="`, `">"`, `">="`, `"<"`, `"<="`,
`"&"`, etc.)
* aggregating functions(e.g. `sum`, `any`, `all`, `mean`, `sd`, `var`, etc)
* something fails when extracting all function calls in an expression with
`all_funs()`. For example, in `dplyr::n()`, only `::` is identified as a
function call by `all_funs()` and {arrow}'s `is_function()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]