alamb commented on code in PR #7574: URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2121421343
########## parquet/tests/arrow_reader/mod.rs: ########## @@ -1027,11 +1058,15 @@ async fn make_test_file_rg(scenario: Scenario, row_per_group: usize) -> NamedTem .tempfile() .expect("tempfile creation"); - let props = WriterProperties::builder() + let mut builder = WriterProperties::builder() .set_max_row_group_size(row_per_group) .set_bloom_filter_enabled(true) - .set_statistics_enabled(EnabledStatistics::Page) - .build(); + .set_statistics_enabled(EnabledStatistics::Page); + if matches!(scenario, Scenario::TruncatedUTF8) { Review Comment: Instead of using `matches!` here, could you please add a method to `Scenario`, ilke `if `scenario.truncate_stats()`? That way 1. There is a clearer place to add the documentation 2. It is easier to see by looking at `Scenario` that it may truncate the stats as well ########## parquet/tests/arrow_reader/statistics.rs: ########## @@ -354,7 +376,45 @@ impl Test<'_> { // // Remaining cases // f64::NAN -// - Using truncated statistics ("exact min value" and "exact max value" https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html#method.max_is_exact) + +#[tokio::test] +async fn test_max_and_min_value_truncated() { + let reader = TestReader { + scenario: Scenario::TruncatedUTF8, + row_per_group: 5, + } + .build() + .await; + + Test { + reader: &reader, + // min is truncated to + // 1. `"a".repeate(64)`, original value is `"a".repeat(64) + "1"` + // 2. "", since there's a null in the second row group + // 3. "j" + expected_min: Arc::new(StringArray::from(vec![&("a".repeat(64)), "", "j"])), Review Comment: I agree I would expect NULL in the second group (for unknown statistics) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org