gongcastro opened a new issue, #35431:
URL: https://github.com/apache/arrow/issues/35431
### Describe the bug, including details regarding any error messages,
version, and platform.
Hi! I wanted to create a variable in a data frame with the cumulative counts
of some other variable.
Without using Arrow, I get what I need:
```r
library(dplyr)
library(tibble)
mtcars |>
rownames_to_column("model") |>
select(model, cyl) |>
group_by(cyl) |>
mutate(seq_counts = 1:n())
```
Which returns:
```
# A tibble: 32 × 3
model cyl seq_counts
<chr> <dbl> <int>
1 Mazda RX4 6 1
2 Mazda RX4 Wag 6 2
3 Datsun 710 4 1
4 Hornet 4 Drive 6 3
5 Hornet Sportabout 8 1
6 Valiant 6 4
7 Duster 360 8 2
8 Merc 240D 4 2
9 Merc 230 4 3
10 Merc 280 6 5
```
Since Arrow does not support `n()` yet, I'm using `to_duckdb()` to continue
the pipeline (I'm using `mtcars` here for minimal reproducibility, but my
actual dataset is way bigger, therefore the need to use Arrow/DuckDB). But when
using the same code after `to_duckdb()`, I get the following error:
```r
mtcars |>
rownames_to_column("model") |>
to_duckdb() |>
select(model, cyl) |>
group_by(cyl) |>
mutate(seq_counts = 1:n())
```
```
Error in `purrr::pmap()`:
ℹ In index: 3.
ℹ With name: seq_counts.
Caused by error in `from:to`:
! NA/NaN argument
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In 1:n() : NAs introduced by coercion
```
I encouter the same error when defining n() in a different variable (e.g.,
`mutate(n_total = n(), seq_counts = 1:n_total)`, and when using `seq()` instead
of `:` to make the sequence.
Thanks!
This is my `sessionInfo()`:
```
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
[3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_11.0.0.3 tibble_3.2.1 dplyr_1.1.2 devtools_2.4.3 usethis_2.1.5
loaded via a namespace (and not attached):
[1] pillar_1.9.0 compiler_4.2.2 dbplyr_2.1.1 prettyunits_1.1.1
[5] remotes_2.4.2 tools_4.2.2 pkgbuild_1.3.1 pkgload_1.3.2
[9] bit_4.0.5 memoise_2.0.1 lifecycle_1.0.3 pkgconfig_2.0.3
[13] rlang_1.1.0 cli_3.6.0 DBI_1.1.3 fastmap_1.1.0
[17] duckdb_0.7.1-1 withr_2.5.0 generics_0.1.3 fs_1.5.2
[21] vctrs_0.6.2 bit64_4.0.5 tidyselect_1.2.0 glue_1.6.2
[25] R6_2.5.1 processx_3.8.1 fansi_1.0.3 sessioninfo_1.2.2
[29] callr_3.7.3 purrr_1.0.1 tzdb_0.3.0 blob_1.2.3
[33] magrittr_2.0.3 ps_1.7.5 ellipsis_0.3.2 assertthat_0.2.1
[37] utf8_1.2.2 cachem_1.0.6 crayon_1.5.2
```
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]