lidavidm commented on a change in pull request #10118:
URL: https://github.com/apache/arrow/pull/10118#discussion_r617470663



##########
File path: r/R/dataset-scan.R
##########
@@ -183,6 +191,10 @@ ScannerBuilder <- R6Class("ScannerBuilder", inherit = 
ArrowObject,
       dataset___ScannerBuilder__UseThreads(self, threads)
       self
     },
+    UseAsync = function(use_async = FALSE) {

Review comment:
       Did you mean to default to TRUE?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2746,6 +2746,10 @@ cdef class Scanner(_Weakrefable):
     use_threads : bool, default True
         If enabled, then maximum parallelism will be used determined by
         the number of available CPU cores.
+    use_async : bool, default False
+        If enabled, the an async scanner will be used that should offer
+        better performance with high-latency/highly-parallel filesystems
+        (e.g. S3)

Review comment:
       The option needs to be added to _populate_builder and 
Scanner.from_fragment/Scanner.from_dataset or else it won't actually take 
effect.

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2746,6 +2746,10 @@ cdef class Scanner(_Weakrefable):
     use_threads : bool, default True
         If enabled, then maximum parallelism will be used determined by
         the number of available CPU cores.
+    use_async : bool, default False
+        If enabled, the an async scanner will be used that should offer
+        better performance with high-latency/highly-parallel filesystems
+        (e.g. S3)

Review comment:
       Should we also add the parameter to the tests? Maybe some refactoring is 
needed though to make it easier to share parameters like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to