alamb commented on PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3297735001
> I agree. I’ve been thinking more about this, especially since my focus is
primarily on cuDF rather than DataFusion.
>
> At a high level, it’s a trade-off between computation (specifically
decompression) and I/O (file size reduction). In CPU scenarios like datafusion,
I believe reading compressed Parquet files tends to be computation-bound
Yes I 100% agree with this conclusion -- and I think allowing users to have
more control over this tradeoff via a tuning knob would make a lot of sense.
I filed a ticket to try and capture the discussion
- https://github.com/apache/arrow-rs/issues/8358
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]