alamb opened a new issue, #8678:
URL: https://github.com/apache/arrow-rs/issues/8678

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   - part of https://github.com/apache/arrow-rs/issues/8000
   
   We added the ParquetPushDecoder in 
https://github.com/apache/arrow-rs/pull/7997
   
   One of the rationales is to avoid duplicating the control logic between the 
Async Reader and the Sync Reader. 
   
   However, it actually (temporarily) makes the problem worse by adding a 3rd  
copy of the control logic ala  the [xkcd standards 
effect](https://xkcd.com/927/) 
   
   <img width="400"  alt="image" 
src="https://github.com/user-attachments/assets/e6886ee9-58b3-4a1e-8e88-9d2d03132b19";
 />
   
   
   
   **Describe the solution you'd like**
   
   Rewrite 
[`ParquetRecordBatchReader`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ParquetRecordBatchReader.html)
 in using the ParquetPushDecoder
   
   **Describe alternatives you've considered**
   
   The IO pattern in the sync decoder is different than the async decoder -- it 
evaluates predicates on *all* row groups first and then decodes each row group 
so the push decoder will take some finagling
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to