yyanyy commented on a change in pull request #1820:
URL: https://github.com/apache/iceberg/pull/1820#discussion_r538784447
##########
File path: core/src/test/java/org/apache/iceberg/TestManifestReaderStats.java
##########
@@ -149,19 +150,35 @@ private void assertFullStats(DataFile dataFile) {
Assert.assertNull(dataFile.columnSizes());
Assert.assertEquals(VALUE_COUNT, dataFile.valueCounts());
Assert.assertEquals(NULL_VALUE_COUNTS, dataFile.nullValueCounts());
+ Assert.assertEquals(NAN_VALUE_COUNTS, dataFile.nanValueCounts());
Assert.assertEquals(LOWER_BOUNDS, dataFile.lowerBounds());
Assert.assertEquals(UPPER_BOUNDS, dataFile.upperBounds());
- Assert.assertEquals(NAN_VALUE_COUNTS, dataFile.nanValueCounts());
+
+ Assert.assertEquals(FILE_PATH, dataFile.path()); // always select file
path in all test cases
}
private void assertStatsDropped(DataFile dataFile) {
- Assert.assertEquals(3, dataFile.recordCount()); // always select record
count in all test cases
+ Assert.assertEquals(3, dataFile.recordCount()); // record count is not
considered as droppable stats
Assert.assertNull(dataFile.columnSizes());
Assert.assertNull(dataFile.valueCounts());
Assert.assertNull(dataFile.nullValueCounts());
+ Assert.assertNull(dataFile.nanValueCounts());
Assert.assertNull(dataFile.lowerBounds());
Assert.assertNull(dataFile.upperBounds());
+
+ Assert.assertEquals(FILE_PATH, dataFile.path()); // always select file
path in all test cases
+ }
+
+ private void assertNoStats(DataFile dataFile) {
+ Assert.assertEquals(-1L, dataFile.recordCount());
Review comment:
This code actually is not testing results from `copyWithoutStats`,
`copyWithoutStats` results are actually tested by another method
`assertStatsDropped` which still keeps `recordCount`. In the test case that
uses this `assertNoStats`, only `select` but no `filter` operation is applied
to the manifest reader, so the reader doesn't project stats columns when
reading (since manifest entries don't have to go through evaluators when no
filter is applied), and thus only return the field being selected (in this case
`file_path`). But I can see that this name is confusing so I updated it a bit
to hopefully makes things clearer.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]