wgtmac commented on code in PR #49855:
URL: https://github.com/apache/arrow/pull/49855#discussion_r3417879507


##########
cpp/src/parquet/file_reader.cc:
##########
@@ -432,6 +433,40 @@ class SerializedFile : public ParquetFileReader::Contents {
     return cached_source_->WaitFor(ranges);
   }
 
+  // Evict cached bytes that were populated by PreBuffer() for the given row
+  // groups and column indices. Callers should only invoke this once the
+  // corresponding row group data has been fully decoded and no readers are
+  // holding a reference to the cached buffers.
+  void EvictPreBufferedData(const std::vector<int>& row_groups,
+                            const std::vector<int>& column_indices) {
+    if (!cached_source_) {
+      return;
+    }
+    for (int row : row_groups) {

Review Comment:
   Eviction is done one row group at a time, but cache entries are only removed 
if they are fully contained in that row group’s byte window. With default 
coalescing, a single cache entry can span adjacent row groups, so evicting row 
group 0 leaves the entry because it extends past the window, and evicting row 
group 1 also leaves it because the entry starts before the window. That entry 
is then never released, so the memory growth this PR is meant to fix can still 
occur for small or adjacent row groups.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to