[ https://issues.apache.org/jira/browse/ARROW-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305506#comment-17305506 ]
Antoine Pitrou commented on ARROW-12030: ---------------------------------------- Letting the user specify a RAM limit would be reasonable IMHO. But I also wonder if it makes sense to have readahead at that many levels. > Change dataset readahead to be based on available RAM/CPU instead of fixed > constants/options > -------------------------------------------------------------------------------------------- > > Key: ARROW-12030 > URL: https://issues.apache.org/jira/browse/ARROW-12030 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Assignee: Weston Pace > Priority: Major > > Right now in the dataset scanning there are a few places where we add > readahead. At each spot we have to pick some max for how much we read ahead. > Instead of trying to figure out some max it might be nicer to base it on the > available RAM. > On the other hand, it may be the case that there is some set of nice > constants that just always works so this can probably wait until we > understand more the memory usage of dataset scanning. -- This message was sent by Atlassian Jira (v8.3.4#803005)