Weston Pace created ARROW-12090:
-----------------------------------
Summary: [C++] Expose CSV block level readahead as a read option
Key: ARROW-12090
URL: https://issues.apache.org/jira/browse/ARROW-12090
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Assignee: Weston Pace
All of the CSV readers today base their I/O readahead on the parallelism of the
executor (or 2 for the serial reader). This is a reasonable default if the I/O
is homogeneous but better values could presumably be used for some situations.
For example, if most files are buffered in RAM (and the reader is CPU bound for
these files) but some files are not, then you would want the readahead to be
large enough to read the unbuffered files while the CPU bound work is being
done (assuming you are even lucky enough for things to be scheduled in that way)
This isn't likely to be much benefit in most situations though and it does add
yet another option so I'm not really motivated to do this work until such a
situation arises.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)