gongcastro opened a new issue, #35431:
URL: https://github.com/apache/arrow/issues/35431

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hi! I wanted to create a variable in a data frame with the cumulative counts 
of some other variable. 
   
   Without using Arrow, I get what I need:
   
   ```r
   library(dplyr)
   library(tibble)
   
   mtcars |> 
     rownames_to_column("model") |>
     select(model, cyl) |> 
     group_by(cyl) |> 
     mutate(seq_counts = 1:n())
   ```
   
   Which returns:
   
   ```
   # A tibble: 32 × 3
      model               cyl seq_counts
      <chr>             <dbl>      <int>
    1 Mazda RX4             6          1
    2 Mazda RX4 Wag         6          2
    3 Datsun 710            4          1
    4 Hornet 4 Drive        6          3
    5 Hornet Sportabout     8          1
    6 Valiant               6          4
    7 Duster 360            8          2
    8 Merc 240D             4          2
    9 Merc 230              4          3
   10 Merc 280              6          5
   ```
   
   Since Arrow does not support `n()` yet, I'm using `to_duckdb()` to continue 
the pipeline (I'm using `mtcars` here for minimal reproducibility, but my 
actual dataset is way bigger, therefore the need to use Arrow/DuckDB). But when 
using the same code after `to_duckdb()`, I get the following error:
   
   ```r
   mtcars |> 
     rownames_to_column("model") |>
     to_duckdb() |>
     select(model, cyl) |> 
     group_by(cyl) |> 
     mutate(seq_counts = 1:n())
   ```
   
   ```
   Error in `purrr::pmap()`:
   ℹ In index: 3.
   ℹ With name: seq_counts.
   Caused by error in `from:to`:
   ! NA/NaN argument
   Run `rlang::last_trace()` to see where the error occurred.
   Warning message:
   In 1:n() : NAs introduced by coercion
   ```
   I encouter the same error when defining n() in a different variable (e.g., 
`mutate(n_total = n(), seq_counts = 1:n_total)`, and when using `seq()` instead 
of `:` to make the sequence.
   
   Thanks!
   
   This is my `sessionInfo()`:
   
   ```
   R version 4.2.2 (2022-10-31 ucrt)
   Platform: x86_64-w64-mingw32/x64 (64-bit)
   Running under: Windows 10 x64 (build 22621)
   
   Matrix products: default
   
   locale:
   [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8
   [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
   [5] LC_TIME=Spanish_Spain.utf8
   
   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base
   
   other attached packages:
   [1] arrow_11.0.0.3 tibble_3.2.1   dplyr_1.1.2    devtools_2.4.3 usethis_2.1.5
   
   loaded via a namespace (and not attached):
    [1] pillar_1.9.0      compiler_4.2.2    dbplyr_2.1.1      prettyunits_1.1.1
    [5] remotes_2.4.2     tools_4.2.2       pkgbuild_1.3.1    pkgload_1.3.2
    [9] bit_4.0.5         memoise_2.0.1     lifecycle_1.0.3   pkgconfig_2.0.3
   [13] rlang_1.1.0       cli_3.6.0         DBI_1.1.3         fastmap_1.1.0
   [17] duckdb_0.7.1-1    withr_2.5.0       generics_0.1.3    fs_1.5.2
   [21] vctrs_0.6.2       bit64_4.0.5       tidyselect_1.2.0  glue_1.6.2
   [25] R6_2.5.1          processx_3.8.1    fansi_1.0.3       sessioninfo_1.2.2
   [29] callr_3.7.3       purrr_1.0.1       tzdb_0.3.0        blob_1.2.3
   [33] magrittr_2.0.3    ps_1.7.5          ellipsis_0.3.2    assertthat_0.2.1
   [37] utf8_1.2.2        cachem_1.0.6      crayon_1.5.2
   ```
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to