[
https://issues.apache.org/jira/browse/ARROW-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace closed ARROW-12090.
-------------------------------
Resolution: Won't Fix
I don't think I'm ever going to do this. It's a very fine-tuned option. If we
start to see it matter (e.g. different datasets need different values) then we
should find a way to detect the correct value.
> [C++] Expose CSV I/O readahead as a read option
> -----------------------------------------------
>
> Key: ARROW-12090
> URL: https://issues.apache.org/jira/browse/ARROW-12090
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Minor
>
> All of the CSV readers today base their I/O readahead on the parallelism of
> the executor (or 2 for the serial reader). This is a reasonable default if
> the I/O is homogeneous but better values could presumably be used for some
> situations.
> For example, if most files are buffered in RAM (and the reader is CPU bound
> for these files) but some files are not, then you would want the readahead to
> be large enough to read the unbuffered files while the CPU bound work is
> being done (assuming you are even lucky enough for things to be scheduled in
> that way)
> This isn't likely to be much benefit in most situations though and it does
> add yet another option so I'm not really motivated to do this work until such
> a situation arises.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)