mapleFU commented on issue #36765:
URL: https://github.com/apache/arrow/issues/36765#issuecomment-1735491961
```
Status FileReaderImpl::GetRecordBatchReader(const std::vector<int>&
row_groups,
const std::vector<int>&
column_indices,
std::unique_ptr<RecordBatchReader>* out) {
RETURN_NOT_OK(BoundsCheck(row_groups, column_indices));
if (reader_properties_.pre_buffer()) {
// PARQUET-1698/PARQUET-1820: pre-buffer row groups/column chunks if
enabled
BEGIN_PARQUET_CATCH_EXCEPTIONS
reader_->PreBuffer(row_groups, column_indices,
reader_properties_.io_context(),
reader_properties_.cache_options());
END_PARQUET_CATCH_EXCEPTIONS
}
```
Here, Pre_Buffer will try to buffer the require RowGroups if neccessary, and
memory will not be released until read is finished. It's different from
buffering mode( actually buffering mode might decrease the memory usage, lol).
Even when policy is lazy, the reader might not get faster if RowGroup is
large enough, and memory will not be released before read is finished. So I
wonder if this is ok.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]