[
https://issues.apache.org/jira/browse/ARROW-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430198#comment-17430198
]
Weston Pace commented on ARROW-14354:
-------------------------------------
> Hmm, ok, so reducing the IO thread pool size wouldn't fix this particular
> issue (of Parquet performance), right?
Correct
> That's also my intuition. Might be worth checking the policy used by
> Postgres, MariaDB and other well-tuned database engines.
Good idea, I'll look into it.
> Hmm, ideally the user should be able to override the IO context but the
> default IO context (if not overriden) should be filesystem-decided. Perhaps
> we need to pass nullptr to say "use the default" (is it already the case?).
Right now IOContext is passed by value. There is arrow::io::default_io_context
which is often used as a default method parameter but it is global and not
based on any filesystem. We can tackle this I suppose as soon as we have a
good reason to differentiate (which may be a result of this issue).
> [C++] Investigate reducing I/O thread pool size to avoid CPU wastage.
> ---------------------------------------------------------------------
>
> Key: ARROW-14354
> URL: https://issues.apache.org/jira/browse/ARROW-14354
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> If we are reading over HTTP (e.g. S3) we generally want high parallelism in
> the I/O thread pool.
> If we are reading from disk then high parallelism is usually harmless but
> ineffective. Most of the I/O threads will spend their time in a waiting
> state and the cores can be used for other work.
> However, it appears that when we are reading locally, and the data is cached
> in memory, then having too much parallelism will be harmful, but some
> parallelism is beneficial. Once the DRAM <-> CPU bandwidth limit is hit then
> all reading threads will experience high DRAM latency. Unlike an I/O
> bottleneck a RAM bottleneck will waste cycles on the physical core.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)