jonkeane commented on pull request #10730:
URL: https://github.com/apache/arrow/pull/10730#issuecomment-910562170


   OK, I just ran Array to R benchmarks locally, and the results are very 
impressive. I've added in other types (even though they are not (yet) backed by 
altrep). I've also tested both chunked and non-chunked arrays, and with nulls 
and without.
   
   TL;DR:
   
   For floats + ints we see huge speed ups (~10x faster) in both arrays with 
and without nulls as well as in arrays and chunked arrays (at least in the case 
where there is a single chunk*).
   
   I'm still investigating what's going on with the two fannie parquet file 
reads that show up as regressions. If anything, this should have sped them up. 
   
   * – we don't yet have fixtures with chunked arrays with multiple chunks. But 
I simulated this with a csv (which does come in with multiple chunks) and 
multi-chunk arrays aren't (yet) supported:
   
   ``` r
   library(arrow)
   
   nyctaxi <- read_csv_arrow("~/repos/ab_store/data/nyctaxi_2010-01.csv.gz", 
as_data_frame = FALSE)
   
   chunked_array <- as.vector(nyctaxi[[5]])
   .Internal(inspect(chunked_array))
   #> @7fb218000000 14 REALSXP g0c7 [REF(4)] (len=14863778, tl=0) 
0.75,5.9,4,4.7,0.6,...
   
   array <- as.vector(nyctaxi[[5]]$chunk(1))
   .Internal(inspect(array))
   #> @7fb20dc3a6e0 14 REALSXP g0c0 [REF(65535)] arrow::Array<double, NONULL> 
len=5738, Array=<0x7fb25d5080d8>
   #>   @7fb20dc3a670 22 EXTPTRSXP g0c0 [REF(4)]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to