[
https://issues.apache.org/jira/browse/ARROW-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307362#comment-17307362
]
Antoine Pitrou commented on ARROW-12030:
----------------------------------------
In the pull model it seems you only need readahead at the junction between IO
and CPU tasks, right?
In the push model, it seems you would need to add buffering *and* blocking at
every level.
> [C++] Change dataset readahead to be based on available RAM/CPU instead of
> fixed constants/options
> --------------------------------------------------------------------------------------------------
>
> Key: ARROW-12030
> URL: https://issues.apache.org/jira/browse/ARROW-12030
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
>
> Right now in the dataset scanning there are a few places where we add
> readahead. At each spot we have to pick some max for how much we read ahead.
> Instead of trying to figure out some max it might be nicer to base it on the
> available RAM.
> On the other hand, it may be the case that there is some set of nice
> constants that just always works so this can probably wait until we
> understand more the memory usage of dataset scanning.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)