Re: [PR] feat(parquet): make `PushBuffers` boundary-agnostic for prefetch IO [arrow-rs]

via GitHub Tue, 14 Apr 2026 13:44:59 -0700


etseidl commented on code in PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#discussion_r3082362125



##########
parquet/src/arrow/push_decoder/reader_builder/mod.rs:
##########
@@ -610,6 +617,12 @@ impl RowGroupReaderBuilder {
                     &mut self.buffers,
                 )?;
 
+                // All data for this row group has been extracted into the
+                // InMemoryRowGroup.  Release physical buffers up to the end
+                // of this row group so streaming IO can reclaim memory.
+                self.buffers

Review Comment:
   > Also technically there is no reason that row groups have to be written in 
order (though most writers will do that) -- for example, you could have a file 
where the bytes for row group 0 are after the bytes for row group 1.
   
   Indeed, coworkers of mine are using this property as a means to do deletions 
from parquet files. Rewrite a single row group, tack it onto the end of the 
file, and then modify the footer to point to the new row group and ignore the 
original.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(parquet): make `PushBuffers` boundary-agnostic for prefetch IO [arrow-rs]

Reply via email to