rdblue commented on a change in pull request #600: Implement in and notIn in
multiple visitors
URL: https://github.com/apache/incubator-iceberg/pull/600#discussion_r355151098
##########
File path:
parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java
##########
@@ -692,4 +694,66 @@ public void testMissingDictionaryPageForColumn() {
() -> new ParquetDictionaryRowGroupFilter(SCHEMA,
notEqual("some_nulls", "some"))
.shouldRead(parquetSchema, rowGroupMetadata, descriptor -> null));
}
+
+ @Test
+ public void testIntegerIn() {
+ boolean shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id",
5, 6))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertFalse("Should not read: id below lower bound (5 < 30, 6 <
30)", shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 28, 29))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertFalse("Should not read: id below lower bound (28 < 30, 29 <
30)", shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 30, 31))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertTrue("Should read: id equal to lower bound (30 == 30)",
shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 75, 76))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertTrue("Should read: id between lower and upper bounds (30 < 75
< 79, 30 < 76 < 79)", shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 79, 80))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertTrue("Should read: id equal to upper bound (79 == 79)",
shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 80, 81))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertFalse("Should not read: id above upper bound (80 > 79, 81 >
79)", shouldRead);
+
+ shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA, in("id", 85, 86))
+ .shouldRead(parquetSchema, rowGroupMetadata, dictionaryStore);
+ Assert.assertFalse("Should not read: id above upper bound (85 > 79, 86 >
79)", shouldRead);
+ }
+
+ @Test
+ public void testIntegerNotIn() {
+ boolean shouldRead = new ParquetDictionaryRowGroupFilter(SCHEMA,
notIn("id", 5, 6))
Review comment:
All of these cases test the path where the size of the set is > 1.
This should test:
* The `notIn` set is a subset of the dictionary
* The dictionary is a subset of the `notIn` set
* The two sets are disjoint
* The two sets are equal
I'd also like to see some tests on the `some_nulls` column.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]