Re: [PR] feat(parquet): dictionary fallback heuristic based on compression efficiency [arrow-rs]

via GitHub Tue, 14 Apr 2026 02:25:29 -0700


mzabaluev commented on PR #9700:
URL: https://github.com/apache/arrow-rs/pull/9700#issuecomment-4242787364


   Good points @etseidl. Our motivation for adding this is that in some cases 
e.g. with high cardinality, the Rust parquet writer produces much larger 
encoded Parquet than the Spark workloads we're aiming to replace. So using a 
default option that enables the heuristic akin to the one hardcoded into 
parquet-java would get us on par (or maybe better, because this implementation 
may choose to fall back at any page chunk in the dataframe).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(parquet): dictionary fallback heuristic based on compression efficiency [arrow-rs]

Reply via email to