Re: [I] [DISCUSS] Decouple IO and CPU operations in the Parquet Reader (push decoder?) [arrow-rs]

via GitHub Thu, 24 Jul 2025 05:25:38 -0700


alamb commented on issue #7983:
URL: https://github.com/apache/arrow-rs/issues/7983#issuecomment-3113276966


   > Broadly speaking I agree with this, in fact my original proposal was for 
such a reader https://github.com/apache/arrow-rs/issues/1605 however the 
realities of the current code and various aspects of the parquet format meant 
we ended up with the current situation as a pragmatic hack. I'd love to see 
something better in this space.
   
   Thank you -- I am sorry I should have mentioned your previous work in this 
space -- and the fact that it inspired this writeup.
   
   > Fortunately we have relatively good test coverage of these quirks, so 
provided any rework was able to reuse these, we should avoid regressions.
   
   Indeed -- what I think would be ideal is if both the 
ParquetRecordBatchReader (sync) and ParquetRecordBatchStream (async) were just 
"IO wrappers" around this `ParquetDecoder`
   
   Now we just need to find someone with enough time to do the work -- lol 🎣 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [DISCUSS] Decouple IO and CPU operations in the Parquet Reader (push decoder?) [arrow-rs]

Reply via email to