boshek commented on issue #15056:
URL: https://github.com/apache/arrow/issues/15056#issuecomment-1376600272

   The feather files are rather large so for something like this where so much 
is going across the wire, csv or parquet can actually be quicker. That said, it 
_should_ still work regardless of file format. 
   
   ```
   microbenchmark::microbenchmark(
      arrow = open_dataset(arrow_bucket, format = 'arrow', unify_schemas = 
FALSE) %>% 
        filter(parameter == 'Ozone',
               sample_duration == '1 HOUR',
               poc == 1) %>% 
        collect(),
      csv = open_dataset(csv_bucket, format = 'csv', unify_schemas = FALSE) %>% 
        filter( parameter == 'Ozone',
               sample_duration == '1 HOUR',
               poc == 1) %>% 
        collect(),
      parquet = open_dataset(parquet_bucket, format = 'parquet', unify_schemas 
= FALSE) %>% 
        filter( parameter == 'Ozone',
                sample_duration == '1 HOUR',
                poc == 1) %>% 
        collect(), times = 3L
    )
   Unit: seconds
       expr       min        lq      mean    median        uq       max neval
      arrow 16.984894 17.353667 17.822625 17.722439 18.241490 18.760542     3
        csv 10.173464 10.474577 10.876690 10.775691 11.228303 11.680915     3
    parquet  8.599491  8.630459  8.705268  8.661427  8.758156  8.854885     3
    ```
    
    The reason I suggested Linux was because rsc and shinyapps running on linux 
machines so maybe this isn't related at to RStudio Connect. That would be good 
to test. I will see what I can do. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to