Weston Pace created ARROW-14974:
-----------------------------------

             Summary: [C++] Dataset scanning, in async mode, is running parquet 
reads on the CPU thread pool
                 Key: ARROW-14974
                 URL: https://issues.apache.org/jira/browse/ARROW-14974
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


This is something I picked up while doing some profiling a while back.  When 
running a scan of a large parquet dataset many of the read tasks (e.g. I/O 
reads) were running on the CPU thread pool.  This could lead to the CPU thread 
pool being underutilized.

It might not have a large effect on the parquet read itself (if the reads are 
slow we are probably I/O bound so one might not notice) but it can cause issues 
on a more complex query where reading is being interleaved with CPU work (like 
filtering and joining).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to