[jira] [Closed] (ARROW-12090) [C++] Expose CSV I/O readahead as a read option

Weston Pace (Jira) Mon, 07 Jun 2021 13:51:07 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Weston Pace closed ARROW-12090.
-------------------------------
    Resolution: Won't Fix

I don't think I'm ever going to do this.  It's a very fine-tuned option.  If we 
start to see it matter (e.g. different datasets need different values) then we 
should find a way to detect the correct value.

> [C++] Expose CSV I/O readahead as a read option
> -----------------------------------------------
>
>                 Key: ARROW-12090
>                 URL: https://issues.apache.org/jira/browse/ARROW-12090
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Minor
>
> All of the CSV readers today base their I/O readahead on the parallelism of 
> the executor (or 2 for the serial reader).  This is a reasonable default if 
> the I/O is homogeneous but better values could presumably be used for some 
> situations.
> For example, if most files are buffered in RAM (and the reader is CPU bound 
> for these files) but some files are not, then you would want the readahead to 
> be large enough to read the unbuffered files while the CPU bound work is 
> being done (assuming you are even lucky enough for things to be scheduled in 
> that way)
> This isn't likely to be much benefit in most situations though and it does 
> add yet another option so I'm not really motivated to do this work until such 
> a situation arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-12090) [C++] Expose CSV I/O readahead as a read option

Reply via email to