bkietz commented on a change in pull request #9589: URL: https://github.com/apache/arrow/pull/9589#discussion_r584779485
########## File path: cpp/src/arrow/dataset/scanner.h ########## @@ -163,12 +164,20 @@ class ARROW_DS_EXPORT Scanner { /// in a concurrent fashion and outlive the iterator. Result<ScanTaskIterator> Scan(); + /// \brief Apply a visitor to each RecordBatch as it is scanned. If multiple + /// threads are used, the visitor will be invoked from those threads and is + /// responsible for any synchronization. + Status Scan(std::function<Status(std::shared_ptr<RecordBatch>)> visitor); Review comment: The difference is `ToBatches().Visit(visitor)` would invoke `visitor` exclusively on the thread which called `Visit()` whereas `Scan(visitor)` invokes `visitor` in the scan's thread pool. This method is speculative; I'm not sure we'd want to provide that but I included it as an example ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org