CookiePieWw commented on code in PR #7574: URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2119243590
########## parquet/tests/arrow_reader/statistics.rs: ########## @@ -354,7 +376,45 @@ impl Test<'_> { // // Remaining cases // f64::NAN -// - Using truncated statistics ("exact min value" and "exact max value" https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html#method.max_is_exact) + +#[tokio::test] +async fn test_max_and_min_value_truncated() { + let reader = TestReader { + scenario: Scenario::TruncatedUTF8, + row_per_group: 5, + } + .build() + .await; + + Test { + reader: &reader, + // min is truncated to + // 1. `"a".repeate(64)`, original value is `"a".repeat(64) + "1"` + // 2. "", since there's a null in the second row group + // 3. "j" + expected_min: Arc::new(StringArray::from(vec![&("a".repeat(64)), "", "j"])), Review Comment: When calculating minimums here, we got an empty string when it should be a null value. https://github.com/apache/arrow-rs/blob/9c5c5c73a7d8d0faac3dc6511d2fbfdb197fdd3b/parquet/src/arrow/arrow_writer/byte_array.rs#L578 So that we have a null value in the second group, then we got a exact empty string as the minimum value. Is this expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org