Weston Pace created ARROW-16294:
-----------------------------------
Summary: [C++] Improve performance of parquet readahead
Key: ARROW-16294
URL: https://issues.apache.org/jira/browse/ARROW-16294
Project: Apache Arrow
Issue Type: Improvement
Reporter: Weston Pace
The 7.0.0 readahead for parquet would read up to 256 row groups at once which
meant that, if the consumer were too slow, we would almost certainly run out of
memory.
ARROW-15410 improved readahead as a whole and, in the process, changed parquet
so it's always reading 1 row group in advance.
This is not always ideal in S3 scenarios. We may want to read many row groups
in advance if the row groups are small. To fix this we should continue reading
in parallel until there are at least batch_size * batch_readahead rows being
fetched.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)