[ https://issues.apache.org/jira/browse/ARROW-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305409#comment-17305409 ]
Antoine Pitrou commented on ARROW-12030: ---------------------------------------- I don't think we should do any elaborate guesswork, *especially* based on available RAM (how do you evaluate that?). Instead, we can simply expose the readahead parameter to the user. > Change dataset readahead to be based on available RAM/CPU instead of fixed > constants/options > -------------------------------------------------------------------------------------------- > > Key: ARROW-12030 > URL: https://issues.apache.org/jira/browse/ARROW-12030 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Assignee: Weston Pace > Priority: Major > > Right now in the dataset scanning there are a few places where we add > readahead. At each spot we have to pick some max for how much we read ahead. > Instead of trying to figure out some max it might be nicer to base it on the > available RAM. > On the other hand, it may be the case that there is some set of nice > constants that just always works so this can probably wait until we > understand more the memory usage of dataset scanning. -- This message was sent by Atlassian Jira (v8.3.4#803005)