alamb commented on code in PR #7574: URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2119000113
########## parquet/src/arrow/arrow_reader/statistics.rs: ########## @@ -1403,6 +1403,48 @@ impl<'a> StatisticsConverter<'a> { max_statistics(data_type, iter, self.physical_type) } + /// Extract the `is_max_value_exact` flags from row group statistics in [`RowGroupMetaData`] + /// + /// See docs on [`Self::row_group_mins`] for details Review Comment: ```suggestion /// See docs on [`Self::row_group_maxes`] for details ``` ########## parquet/tests/arrow_reader/mod.rs: ########## @@ -987,6 +991,17 @@ fn create_data_batch(scenario: Scenario) -> Vec<RecordBatch> { make_utf8_batch(vec![Some("e"), Some("f"), Some("g"), Some("h"), Some("i")]), ] } + Scenario::TruncatedUTF8 => { + // Make utf8 batch with strings longer than 64 bytes + // to check truncation of row group statistics + vec![make_utf8_batch(vec![ Review Comment: Could you also please add coverage for: 1. They are not all truncated (e.g. make at least one of the groups less than 64 bytes) 2. There is at least one null? I think it is important test coverage to include the is_exact flag for those cases too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org