CookiePieWw commented on code in PR #7574:
URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2119243590


##########
parquet/tests/arrow_reader/statistics.rs:
##########
@@ -354,7 +376,45 @@ impl Test<'_> {
 //
 // Remaining cases
 //   f64::NAN
-// - Using truncated statistics  ("exact min value" and "exact max value" 
https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html#method.max_is_exact)
+
+#[tokio::test]
+async fn test_max_and_min_value_truncated() {
+    let reader = TestReader {
+        scenario: Scenario::TruncatedUTF8,
+        row_per_group: 5,
+    }
+    .build()
+    .await;
+
+    Test {
+        reader: &reader,
+        // min is truncated to
+        // 1. `"a".repeate(64)`, original value is `"a".repeat(64) + "1"`
+        // 2. "", since there's a null in the second row group
+        // 3. "j"
+        expected_min: Arc::new(StringArray::from(vec![&("a".repeat(64)), "", 
"j"])),

Review Comment:
   When calculating minimums here, we got an empty string when it should be a 
null value.
   
https://github.com/apache/arrow-rs/blob/9c5c5c73a7d8d0faac3dc6511d2fbfdb197fdd3b/parquet/src/arrow/arrow_writer/byte_array.rs#L578
   So that we have a null value in the second group, then we got a exact empty 
string as the minimum value. Is this expected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to