[
https://issues.apache.org/jira/browse/PARQUET-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinpeng Zhou updated PARQUET-2316:
----------------------------------
Description:
The current FileReader can only work in one of the two modes, coalescing (when
Prebuffer is called) and non-coalescing (when Prefufer is not called), due to
the if statement here:
[[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]|https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203].
Since Prebuffer is basically caching all specified column chunks, it would
raise concerns on memory usage for systems with tight memory budget. In such
scenarios, one may want to Prebuffer some small chunks while being able to read
the rest chunks using BufferedInputStream.
was:
The current FileReader can only work in one of the two modes, coalescing (when
Prebuffer is called) and non-coalescing (when Prefufer is not called), due to
the if statement
[here](https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203).
Since Prebuffer is basically caching all specified column chunks, it would
raise concerns on memory usage for systems with tight memory budget. In such
scenarios, one may want to Prebuffer some small chunks while being able to read
the rest chunks using BufferedInputStream.
> Allow partial prebuffer in parquet FileReader
> ---------------------------------------------
>
> Key: PARQUET-2316
> URL: https://issues.apache.org/jira/browse/PARQUET-2316
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Jinpeng Zhou
> Assignee: Jinpeng Zhou
> Priority: Minor
> Fix For: cpp-12.0.0
>
>
> The current FileReader can only work in one of the two modes, coalescing
> (when Prebuffer is called) and non-coalescing (when Prefufer is not called),
> due to the if statement here:
> [[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]|https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203].
>
> Since Prebuffer is basically caching all specified column chunks, it would
> raise concerns on memory usage for systems with tight memory budget. In such
> scenarios, one may want to Prebuffer some small chunks while being able to
> read the rest chunks using BufferedInputStream.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)