[GitHub] [arrow] westonpace commented on a change in pull request #9892: ARROW-12208: [C++] Add the ability to run async tasks without using the CPU thread pool

GitBox Wed, 07 Apr 2021 10:55:16 -0700


westonpace commented on a change in pull request #9892:
URL: https://github.com/apache/arrow/pull/9892#discussion_r608871802




##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -166,14 +174,15 @@ class CsvScanTask : public ScanTask {
         source_(fragment->source()) {}
 
   Result<RecordBatchIterator> Execute() override {
-    ARROW_ASSIGN_OR_RAISE(auto gen, ExecuteAsync());
+    ARROW_ASSIGN_OR_RAISE(auto gen, 
ExecuteAsync(internal::GetCpuThreadPool()));

Review comment:
       This is a bit awkward.  This method would not be used by either 
`Scanner::ToTable` or `FileSystemDataset::Write` (they will call 
`ExecuteAsync`).  The only way this could be called is if the user called 
`Scan()` directly.
   
   In that case we cannot really use `RunInSerialExecutor` here because it is 
returning an iterator.
   
   I suppose that means there is no truly serial way to call `Scan()` on CSV.  
It wouldn't be parallel exactly, it would just be juggling the work between the 
calling thread, the I/O executor, and the CPU executor.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a change in pull request #9892: ARROW-12208: [C++] Add the ability to run async tasks without using the CPU thread pool

Reply via email to