[GitHub] [arrow] paleolimbot commented on pull request #33917: GH-33892: [R] Map dplyr::n() to count_all kernel

via GitHub Mon, 30 Jan 2023 17:47:59 -0800


paleolimbot commented on PR #33917:
URL: https://github.com/apache/arrow/pull/33917#issuecomment-1409623266


   Thanks for the reprex...I see about 1.5x faster (on M1 with a slightly 
smaller example). I imagine, but don't know, that anything dataset-related will 
loose some of that speed improvement because of disk IO anyway. In any case, 
the in-memory case is compelling!
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   big_table <- arrow_table(z = seq_len(1e8))
   
   bench::mark(
     master = big_table |> 
       mutate(dummy = TRUE) |> 
       summarise(n = sum(dummy)) |> 
       collect(),
     this_pr = big_table |> 
       summarise(n = n()) |> 
       collect(),
     iterations = 10,
     relative = TRUE
   )
   #> # A tibble: 2 × 6
   #>   expression   min median `itr/sec` mem_alloc `gc/sec`
   #>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
   #> 1 master      1.46   1.44      1         17.3     1   
   #> 2 this_pr     1      1         1.44       1       1.44
   ```
   
   <sup>Created on 2023-01-30 with [reprex 
v2.0.2](https://reprex.tidyverse.org)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] paleolimbot commented on pull request #33917: GH-33892: [R] Map dplyr::n() to count_all kernel

Reply via email to