mapleFU commented on PR #37854:
URL: https://github.com/apache/arrow/pull/37854#issuecomment-1735294931
```c++
Status FileReaderImpl::GetRecordBatchReader(const std::vector<int>&
row_groups,
const std::vector<int>&
column_indices,
std::unique_ptr<RecordBatchReader>* out) {
RETURN_NOT_OK(BoundsCheck(row_groups, column_indices));
if (reader_properties_.pre_buffer()) {
// PARQUET-1698/PARQUET-1820: pre-buffer row groups/column chunks if
enabled
BEGIN_PARQUET_CATCH_EXCEPTIONS
reader_->PreBuffer(row_groups, column_indices,
reader_properties_.io_context(),
reader_properties_.cache_options());
END_PARQUET_CATCH_EXCEPTIONS
}
```
Here, Pre_Buffer will try to buffer the require RowGroups if neccessary, and
memory will not be released until read is finished. It's different from
`buffering` mode( actually buffering mode might decrease the memory usage, lol).
Even when policy is `lazy`, the reader might not get faster if RowGroup is
large enough, and memory will not be released before read is finished. So I
wonder if this is ok.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]