Re: [PR] Parquet: Do not compress v2 data page when compress is bad quality [arrow-rs]

via GitHub Tue, 16 Sep 2025 03:53:42 -0700


alamb commented on PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3297735001


   > I agree. I’ve been thinking more about this, especially since my focus is 
primarily on cuDF rather than DataFusion.
   > 
   > At a high level, it’s a trade-off between computation (specifically 
decompression) and I/O (file size reduction). In CPU scenarios like datafusion, 
I believe reading compressed Parquet files tends to be computation-bound
   
   Yes I 100% agree with this conclusion -- and I think allowing users to have 
more control over this tradeoff via a tuning knob would make a lot of sense. 
   
   I filed a ticket to try and capture the discussion
    - https://github.com/apache/arrow-rs/issues/8358


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Parquet: Do not compress v2 data page when compress is bad quality [arrow-rs]

Reply via email to