[GitHub] [arrow] jp0317 opened a new pull request, #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

via GitHub Thu, 06 Jul 2023 12:57:24 -0700


jp0317 opened a new pull request, #36510:
URL: https://github.com/apache/arrow/pull/36510


   ### Rationale for this change
   
   When buffered stream is enabled, all column chunks, regardless of their 
actual sizes, are currently sharing the same buffer size which is stored in the 
shared [read 
properties](https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L213).
  
   
   Given a limited memory budget, one may want to customize buffer size for 
different column chunks based on their actual size, i.e., smaller chunks will 
use consume less memory budget for its buffer.
   
   ### What changes are included in this PR?
   
   Codes for customizing buffer size and unit tests
   
   ### Are these changes tested?
   
   Yes.
   
   ### Are there any user-facing changes?
   
   Extend some APIs on file_reader.h with a new parameter defining the 
customized buffer size for a column chunk. The default behavior (if the new  
parameter is not specified) will not change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jp0317 opened a new pull request, #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

Reply via email to