[jira] [Updated] (PARQUET-2316) Allow partial prebuffer in parquet FileReader

Jinpeng Zhou (Jira) Tue, 20 Jun 2023 14:51:05 -0700


     [ 
https://issues.apache.org/jira/browse/PARQUET-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jinpeng Zhou updated PARQUET-2316:
----------------------------------
    Description: 
The current FileReader can only work in  one of the two modes, coalescing (when 
Prebuffer is called) and non-coalescing (when Prefufer is not called), due to 
the if statement here: 
[[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]|https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203].
 

Since Prebuffer is basically caching all specified column chunks, it would 
raise concerns on memory usage for systems with tight memory budget. In such 
scenarios, one may want to Prebuffer some small chunks while being able to read 
the rest chunks using  BufferedInputStream. 

  was:
The current FileReader can only work in  one of the two modes, coalescing (when 
Prebuffer is called) and non-coalescing (when Prefufer is not called), due to 
the if statement 
[here](https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203).
 

Since Prebuffer is basically caching all specified column chunks, it would 
raise concerns on memory usage for systems with tight memory budget. In such 
scenarios, one may want to Prebuffer some small chunks while being able to read 
the rest chunks using  BufferedInputStream. 


> Allow partial prebuffer in parquet FileReader
> ---------------------------------------------
>
>                 Key: PARQUET-2316
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2316
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Jinpeng Zhou
>            Assignee: Jinpeng Zhou
>            Priority: Minor
>             Fix For: cpp-12.0.0
>
>
> The current FileReader can only work in  one of the two modes, coalescing 
> (when Prebuffer is called) and non-coalescing (when Prefufer is not called), 
> due to the if statement here: 
> [[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]|https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203].
>  
> Since Prebuffer is basically caching all specified column chunks, it would 
> raise concerns on memory usage for systems with tight memory budget. In such 
> scenarios, one may want to Prebuffer some small chunks while being able to 
> read the rest chunks using  BufferedInputStream. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2316) Allow partial prebuffer in parquet FileReader

Reply via email to