westonpace commented on a change in pull request #9892:
URL: https://github.com/apache/arrow/pull/9892#discussion_r608871802



##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -166,14 +174,15 @@ class CsvScanTask : public ScanTask {
         source_(fragment->source()) {}
 
   Result<RecordBatchIterator> Execute() override {
-    ARROW_ASSIGN_OR_RAISE(auto gen, ExecuteAsync());
+    ARROW_ASSIGN_OR_RAISE(auto gen, 
ExecuteAsync(internal::GetCpuThreadPool()));

Review comment:
       This is a bit awkward.  This method would not be used by either 
`Scanner::ToTable` or `FileSystemDataset::Write` (they will call 
`ExecuteAsync`).  The only way this could be called is if the user called 
`Scan()` directly.
   
   In that case we cannot really use `RunInSerialExecutor` here because it is 
returning an iterator.
   
   I suppose that means there is no truly serial way to call `Scan()` on CSV.  
It wouldn't be parallel exactly, it would just be juggling the work between the 
calling thread, the I/O executor, and the CPU executor.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to