mzabaluev commented on PR #9700: URL: https://github.com/apache/arrow-rs/pull/9700#issuecomment-4242787364
Good points @etseidl. Our motivation for adding this is that in some cases e.g. with high cardinality, the Rust parquet writer produces much larger encoded Parquet than the Spark workloads we're aiming to replace. So using a default option that enables the heuristic akin to the one hardcoded into parquet-java would get us on par (or maybe better, because this implementation may choose to fall back at any page chunk in the dataframe). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
