OfekShilon opened a new issue, #15271:
URL: https://github.com/apache/arrow/issues/15271

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Test script that measures R/arrow load time for various sizes:
   ```r
   colnums <- c(10,20,30,100,150,200,300,500)
   rownums <- c(1,2,3,4,5,10,20,30,40,50,60,70,100,200, 300, 400, 500, 1000, 
2000, 3000, 4000, 5000, 10000)
   
   # Generate files
   for (colnum in colnums) {
     for (rownum in rownums) {
       fn.robj <- paste0("~/tmp/robj.",rownum,"x",colnum)
       fn.arrow <- paste0("~/tmp/arrow.",rownum,"x",colnum)
   
       dat <- as.data.frame(matrix(runif(rownum*colnum), nrow=rownum, 
ncol=colnum))
       save(dat, file=fn.robj)
       arrow::write_feather(x = dat, sink = fn.arrow)
     }
   }
   
   times.robj <- matrix(0, nrow=length(rownums), ncol=length(colnums))
   rownames(times.robj) <- paste(rownums,"rows")
   colnames(times.robj) <- paste(colnums,"cols")
   times.arrow <- times.robj
   
   for (i in 1:length(rownums)) {
     for (j in 1:length(colnums)) {
       rownum <- rownums[i]
       colnum <- colnums[j]
       fn.robj <- paste0("~/tmp/robj.",rownum,"x",colnum)
       fn.arrow <- paste0("~/tmp/arrow.",rownum,"x",colnum)
   
       # measure 2nd load to account for cold caches
       load(fn.robj)
       start <- Sys.time(); 
       load(fn.robj); 
       times.robj[i,j] <- Sys.time()-start
       
       tst <- arrow::read_feather(fn.arrow)
       start <- Sys.time(); 
         tst <- arrow::read_feather(fn.arrow); 
       times.arrow[i,j] <- Sys.time()-start
     }
   }
   ```
   Results:
   ```
   > times.arrow / times.robj
                 10 cols     20 cols    30 cols   100 cols   150 cols    200 
cols    300 cols    500 cols
   1 rows     16.1439951  19.7020075 25.1108247 51.1643757 77.1529228  
91.3080397 111.3643533 149.3513743
   2 rows     15.0277094  21.2175810 22.2626322 48.8661710 68.6573327 
650.6486486 134.8991050 130.5041691
   3 rows     14.6777409  20.1436969 20.9700806 47.7467603 63.9312016  
68.5315315  98.5874855 119.4731097
   4 rows     13.2236921  17.4342891 20.9966044 43.8189867 57.1619048  
64.3601299  94.4213217 118.8271915
   5 rows     12.6945607  14.8067084 18.7377778 36.4182165 49.6366695  
56.7033511  73.2449044 115.0325528
   10 rows    13.1203008  16.9616537 16.7252696 37.5056129 47.2363992  
56.1606467  76.4436374  86.6117791
   20 rows    12.4548896 774.0376940 17.5051370 32.4073774 35.6958398  
39.4063311  46.5070936  51.8869215
   30 rows    10.2758259  12.8381764 15.6813459 25.9489239 30.6835476  
31.7596519  35.4976311  41.5393059
   40 rows    10.8671210   7.8244697 15.1399804 23.4805764 29.2812743  
26.6662289  31.4367649  42.6152522
   50 rows    11.3902007  12.6833417 15.2992519 25.2068532 27.2051708  
28.9717248  32.0606809  36.8470872
   60 rows    10.9138495  14.1022129 16.6385948 22.7227723 26.6038445  
27.9418484  28.5083841  33.9032176
   70 rows    10.7040650  12.1799904 13.2777314 19.7737738 20.8106306  
21.8470504  22.5418507  27.6593520
   100 rows   10.7567132  11.7838963 12.8056854 15.0082676 28.4549343  
18.1499451  21.5192503  22.0708589
   200 rows    9.5018797  10.1656687 10.6434257 12.3456125 12.0490603  
12.5274870  13.1872241  14.6434862
   300 rows    9.6111111   8.9652621  8.9622146  9.3272070  9.1396644  
10.0647620  10.6045769  12.0662228
   400 rows    8.7160494   9.3873540  8.3236041  7.2730971  7.9281412   
7.4078140   7.4032556   7.9848605
   500 rows    7.1358811   6.4100263  6.4007276  6.0777437  6.6235458   
6.2249675   6.3370181   6.9172020
   1000 rows   5.3677043   4.4564087  4.1116463  3.6105644  3.2333922   
3.2778293   3.2759320   3.4308380
   2000 rows   3.5031858   2.5319266  2.4289314  1.8577107  1.7995663   
1.7371557   1.7497375   1.8541778
   3000 rows   2.5769010   6.3183501  1.7323371  1.3046406  1.2342389   
1.2235438   1.3174136   1.2508460
   4000 rows   2.0956563   1.4165296  1.8561829  0.9478190  0.8863266   
1.2302510   0.8732958   0.8928616
   5000 rows   1.6759777   1.2119986  1.1039393  0.8229102  1.3977869   
0.9786898   0.9761781   0.8342817
   10000 rows  0.9136646   0.6621193  0.5184357  0.4271505  0.3822572   
0.3574329   0.3735044   0.4495687
   ```
   
   Is this some known overhead? It seems rather large...
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to