alamb commented on code in PR #7574:
URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2119000113


##########
parquet/src/arrow/arrow_reader/statistics.rs:
##########
@@ -1403,6 +1403,48 @@ impl<'a> StatisticsConverter<'a> {
         max_statistics(data_type, iter, self.physical_type)
     }
 
+    /// Extract the `is_max_value_exact` flags from row group statistics in 
[`RowGroupMetaData`]
+    ///
+    /// See docs on [`Self::row_group_mins`] for details

Review Comment:
   ```suggestion
       /// See docs on [`Self::row_group_maxes`] for details
   ```



##########
parquet/tests/arrow_reader/mod.rs:
##########
@@ -987,6 +991,17 @@ fn create_data_batch(scenario: Scenario) -> 
Vec<RecordBatch> {
                 make_utf8_batch(vec![Some("e"), Some("f"), Some("g"), 
Some("h"), Some("i")]),
             ]
         }
+        Scenario::TruncatedUTF8 => {
+            // Make utf8 batch with strings longer than 64 bytes
+            // to check truncation of row group statistics
+            vec![make_utf8_batch(vec![

Review Comment:
   Could you also please add coverage for:
   1. They are not all truncated (e.g. make at least one of the groups less 
than 64 bytes)
   2. There is at least one null?
   
   I think it is important test coverage to include the is_exact flag for those 
cases too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to