lidavidm commented on a change in pull request #10118:
URL: https://github.com/apache/arrow/pull/10118#discussion_r633954536
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2746,6 +2746,10 @@ cdef class Scanner(_Weakrefable):
use_threads : bool, default True
If enabled, then maximum parallelism will be used determined by
the number of available CPU cores.
+ use_async : bool, default False
+ If enabled, the an async scanner will be used that should offer
+ better performance with high-latency/highly-parallel filesystems
+ (e.g. S3)
Review comment:
Ah, sorry, what I mean is that if you're concerned about expanding all
the parameters everywhere, you could declare
```
cdef _populate_builder(shared_ptr[CScannerBuilder]& builder, object kwargs):
# Extract args from kwargs individually...
```
and then make every user-facing method take just `**kwargs`. Of course, that
just pushes the boilerplate around a bit (e.g.potentially having to cast things
inside `_populate_builder`).
Doesn't need to be done here, just wanted to mention one way to reduce the
number of places where we list all the parameters.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]