Re: [I] Rewrite `ParquetRecordBatchReader` (sync api) in terms of the PushDecoder [arrow-rs]

via GitHub Tue, 10 Feb 2026 13:08:40 -0800


alamb commented on issue #8678:
URL: https://github.com/apache/arrow-rs/issues/8678#issuecomment-3880725820


   > There is also the additional complexity of the fact that the push decoder 
itself uses the ParquetRecordBatchReader.
   
   Yes that is a good point
   
   I think the biggest question I have is "do we want to change the sync API" 
-- the IO patterns of the ParquetRecordBatchReader are eager -- it needs all 
the data loaded into memory before it begins decoding
   
   One of the larger differences in the async reader is that it will fetch only 
data needed for filtering when pushing down predicates
   
   So one question becomes -- do we want a new sync API that will do the same 
IO pattern as the async API?
   
   If we want such a new sync API I think we could implement it pretty quickly 
with the PushDecoder
   
   However, the more I think about this the more it seems like the PushDecoder 
now offers plenty of flexbility for sync and async IO patterns. 
   
   So maybe we should close this ticket as "won't do" 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Rewrite `ParquetRecordBatchReader` (sync api) in terms of the PushDecoder [arrow-rs]

Reply via email to