westonpace commented on pull request #10118:
URL: https://github.com/apache/arrow/pull/10118#issuecomment-841861097


   This PR could use some advice from the R community.  I'm adding the ability 
to request async (at the moment async is a performance degredation in some 
cases when I/O is really fast so until we've made more progress there it will 
need to be optional)  I've added `UseAsync` to the scanner in R which is used, 
for example, like this:
   
   ```
   test_that("Scanner$ScanBatches", {
     ds <- open_dataset(ipc_dir, format = "feather")
     batches <- ds$NewScan()$Finish()$ScanBatches()
     table <- Table$create(!!!batches)
     expect_equivalent(as.data.frame(table), rbind(df1, df2))
   
     batches <- ds$NewScan()$UseAsync(TRUE)$Finish()$ScanBatches()
     table <- Table$create(!!!batches)
     expect_equivalent(as.data.frame(table), rbind(df1, df2))
   })
   ```
   However, most of the examples I see reading a dataset are doing something 
like...
   
   ```
   ds %>%
         select(string = chr, integer = int) %>%
         filter(integer > 6 & integer < 11) %>%
         collect() %>%
         summarize(mean = mean(integer))
   ```
   
   How should `UseAsync` be inserted into such a pattern (chain?) of calls.  
Should it be it's own operator:
   
   ```
   ds %>%
         select(string = chr, integer = int) %>%
         filter(integer > 6 & integer < 11) %>%
         use_async() %>%
         collect() %>%
         summarize(mean = mean(integer))
   ```
   
   ...or an argument to `collect`:
   
   ```
   ds %>%
         select(string = chr, integer = int) %>%
         filter(integer > 6 & integer < 11) %>%
         collect(use_async=TRUE) %>%
         summarize(mean = mean(integer))
   ```
   ...or exposed some other different way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to