kbendick commented on code in PR #4734:
URL: https://github.com/apache/iceberg/pull/4734#discussion_r869515613


##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -298,17 +298,32 @@ public void testNoNulls() {
 
   @Test
   public void testIsNaN() {
+    Assume.assumeTrue("Avro files do not have row group statistics", format != 
FileFormat.AVRO);
+
     boolean shouldRead = shouldRead(isNaN("all_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet 
metrics", shouldRead);
+    Assert.assertTrue(String.format("Should read: NaN counts are not tracked 
in %s metrics", format), shouldRead);
 
     shouldRead = shouldRead(isNaN("some_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet 
metrics", shouldRead);
-
-    shouldRead = shouldRead(isNaN("no_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet 
metrics", shouldRead);
+    Assert.assertTrue(String.format("Should read: NaN counts are not tracked 
in %s metrics", format), shouldRead);
 
     shouldRead = shouldRead(isNaN("all_nulls"));
     Assert.assertFalse("Should skip: all null column will not contain nan 
value", shouldRead);
+
+    if (format == FileFormat.ORC) {
+      testIsNanOrc();
+    } else {
+      testIsNanParquet();
+    }
+  }
+
+  private void testIsNanParquet() {
+    boolean shouldRead = shouldRead(isNaN("no_nans"));
+    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet 
metrics", shouldRead);
+  }
+
+  private void testIsNanOrc() {
+    boolean shouldRead = shouldRead(isNaN("no_nans"));
+    Assert.assertFalse("Should skip: no nans will not contain nan value", 
shouldRead);

Review Comment:
   For the ORC assertion message, it might be more clear if we're a little more 
descriptive, given that we've stated that the `NaN counts are not tracked in 
orc metrics` already.
   
   That's true for counts, but ORC tracks whether or not a column in a row 
group contains any NaN values if I'm not mistaken, so we should make that 
distinction more clear as people might not be that familiar with ORC when 
editing this test later.
   
   Something like `Should skip: The existence of NaN in a column is tracked in 
ORC metrics`. WDYT? Please correct me if my understanding is wrong. 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to