[I] poor R performance for arrow_fixed_size_list types [arrow]

via GitHub Mon, 13 May 2024 16:31:36 -0700


kendonB opened a new issue, #41644:
URL: https://github.com/apache/arrow/issues/41644


   ### Describe the enhancement requested
   
   My python colleagues report good performance reading in arrow files 
containing arrow_fixed_size_list types. In R, the read-in takes around 40x the 
time for a string column vs a list column.
   
   ```r
   # string column
   bench::mark(arrow::read_ipc_file(.x, col_select = c("uid")))
   #> A tibble: 1 × 13
   #>  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr  
n_gc total_time result   memory     time       gc      
   #>  <bch:expr>            <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> 
<dbl>   <bch:tm> <list>   <list>     <list>     <list>  
   #> "arrow::read_ipc_fil… 82.5ms 88.1ms      11.4    47.1KB        0     6    
 0      525ms <tibble> <Rprofmem> <bench_tm> <tibble>
   # list column
   bench::mark(arrow::read_ipc_file(.x, col_select = 
c("N_intensity_coefficients")))
   #> A tibble: 1 × 13
   #>  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr  
n_gc total_time result   memory     time       gc      
   #>  <bch:expr>             <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> 
<dbl>   <bch:tm> <list>   <list>     <list>     <list>  
   #>1 "arrow::read_ipc_file… 3.58s  3.58s     0.279    15.3MB        0     1   
  0      3.58s <tibble> <Rprofmem> <bench_tm> <tibble>
   ```
   
   Is this a known issue?
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] poor R performance for arrow_fixed_size_list types [arrow]

Reply via email to