mzabaluev commented on code in PR #9700:
URL: https://github.com/apache/arrow-rs/pull/9700#discussion_r3085776009
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -4827,6 +4895,48 @@ mod tests {
assert_eq!(get_dict_page_size(col1_meta), 1024 * 1024 * 4);
}
+ #[test]
+ fn test_dict_page_size_decided_by_compression_fallback() {
Review Comment:
I have modified the test in
[1b6dd37](https://github.com/apache/arrow-rs/pull/9700/commits/1b6dd3756c447111baf78af969875b7790909070)
to demonstrate a case when even an early fallback decision brings about 12%
compression. But I generally agree with your assessment, so more work is needed.
Another quirk is seen in this test: a dictionary page is still flushed to
encode the first data page, even though there is no benefit. Parquet-java takes
care to hand over the accumulated values to the plain encoder to be re-encoded.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]