kendonB opened a new issue, #41644:
URL: https://github.com/apache/arrow/issues/41644
### Describe the enhancement requested
My python colleagues report good performance reading in arrow files
containing arrow_fixed_size_list types. In R, the read-in takes around 40x the
time for a string column vs a list column.
```r
# string column
bench::mark(arrow::read_ipc_file(.x, col_select = c("uid")))
#> A tibble: 1 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr
n_gc total_time result memory time gc
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int>
<dbl> <bch:tm> <list> <list> <list> <list>
#> "arrow::read_ipc_fil… 82.5ms 88.1ms 11.4 47.1KB 0 6
0 525ms <tibble> <Rprofmem> <bench_tm> <tibble>
# list column
bench::mark(arrow::read_ipc_file(.x, col_select =
c("N_intensity_coefficients")))
#> A tibble: 1 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr
n_gc total_time result memory time gc
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int>
<dbl> <bch:tm> <list> <list> <list> <list>
#>1 "arrow::read_ipc_file… 3.58s 3.58s 0.279 15.3MB 0 1
0 3.58s <tibble> <Rprofmem> <bench_tm> <tibble>
```
Is this a known issue?
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]