yyanyy commented on a change in pull request #1820:
URL: https://github.com/apache/iceberg/pull/1820#discussion_r538784447



##########
File path: core/src/test/java/org/apache/iceberg/TestManifestReaderStats.java
##########
@@ -149,19 +150,35 @@ private void assertFullStats(DataFile dataFile) {
     Assert.assertNull(dataFile.columnSizes());
     Assert.assertEquals(VALUE_COUNT, dataFile.valueCounts());
     Assert.assertEquals(NULL_VALUE_COUNTS, dataFile.nullValueCounts());
+    Assert.assertEquals(NAN_VALUE_COUNTS, dataFile.nanValueCounts());
     Assert.assertEquals(LOWER_BOUNDS, dataFile.lowerBounds());
     Assert.assertEquals(UPPER_BOUNDS, dataFile.upperBounds());
-    Assert.assertEquals(NAN_VALUE_COUNTS, dataFile.nanValueCounts());
+
+    Assert.assertEquals(FILE_PATH, dataFile.path()); // always select file 
path in all test cases
   }
 
   private void assertStatsDropped(DataFile dataFile) {
-    Assert.assertEquals(3, dataFile.recordCount()); // always select record 
count in all test cases
+    Assert.assertEquals(3, dataFile.recordCount()); // record count is not 
considered as droppable stats
     Assert.assertNull(dataFile.columnSizes());
     Assert.assertNull(dataFile.valueCounts());
     Assert.assertNull(dataFile.nullValueCounts());
+    Assert.assertNull(dataFile.nanValueCounts());
     Assert.assertNull(dataFile.lowerBounds());
     Assert.assertNull(dataFile.upperBounds());
+
+    Assert.assertEquals(FILE_PATH, dataFile.path()); // always select file 
path in all test cases
+  }
+
+  private void assertNoStats(DataFile dataFile) {
+    Assert.assertEquals(-1L, dataFile.recordCount());

Review comment:
       This code actually is not testing results from `copyWithoutStats`, 
`copyWithoutStats` results are actually tested by another method 
`assertStatsDropped` which still keeps `recordCount`. In the test case that 
uses this `assertNoStats`, only `select` but no `filter` operation is applied 
to the manifest reader, so the reader doesn't project stats columns when 
reading (since manifest entries don't have to go through evaluators when no 
filter is applied), and thus only return the field being selected (in this case 
`file_path`). But I can see that this name is confusing so I updated it a bit 
to hopefully makes things clearer. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to