Weston Pace created ARROW-14974:
-----------------------------------
Summary: [C++] Dataset scanning, in async mode, is running parquet
reads on the CPU thread pool
Key: ARROW-14974
URL: https://issues.apache.org/jira/browse/ARROW-14974
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
This is something I picked up while doing some profiling a while back. When
running a scan of a large parquet dataset many of the read tasks (e.g. I/O
reads) were running on the CPU thread pool. This could lead to the CPU thread
pool being underutilized.
It might not have a large effect on the parquet read itself (if the reads are
slow we are probably I/O bound so one might not notice) but it can cause issues
on a more complex query where reading is being interleaved with CPU work (like
filtering and joining).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)