justinli500 commented on code in PR #49855:
URL: https://github.com/apache/arrow/pull/49855#discussion_r3425817905
##########
cpp/src/parquet/file_reader.cc:
##########
@@ -432,6 +433,40 @@ class SerializedFile : public ParquetFileReader::Contents {
return cached_source_->WaitFor(ranges);
}
+ // Evict cached bytes that were populated by PreBuffer() for the given row
+ // groups and column indices. Callers should only invoke this once the
+ // corresponding row group data has been fully decoded and no readers are
+ // holding a reference to the cached buffers.
+ void EvictPreBufferedData(const std::vector<int>& row_groups,
+ const std::vector<int>& column_indices) {
+ if (!cached_source_) {
+ return;
+ }
+ for (int row : row_groups) {
Review Comment:
Thanks for the feedback! Both addressed:
- Cross-RG eviction: tracks runs of evicted row groups and evicts each run's
combined window, so that a coalesced entry is freed once all its row groups are
evicted (`EvictEntriesInRange` unchanged). Merges by buffered-set adjacency, so
filtered/non-contiguous scans are covered too; safe under out-of-order
eviction. Tests added for all three cases.
- Comments: trimmed to the invariants.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]