rdblue commented on a change in pull request #600: Implement in and notIn in 
multiple visitors
URL: https://github.com/apache/incubator-iceberg/pull/600#discussion_r361738745
 
 

 ##########
 File path: 
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java
 ##########
 @@ -308,11 +310,69 @@ public Boolean or(Boolean leftResult, Boolean 
rightResult) {
 
     @Override
     public <T> Boolean in(BoundReference<T> ref, Set<T> literalSet) {
+      Integer id = ref.fieldId();
+      Types.NestedField field = struct.field(id);
+      Preconditions.checkNotNull(field, "Cannot filter by nested column: %s", 
schema.findField(id));
+
+      if (canContainNulls(id)) {
+        return ROWS_MIGHT_NOT_MATCH;
+      }
+
+      if (lowerBounds != null && lowerBounds.containsKey(id) &&
+          upperBounds != null && upperBounds.containsKey(id)) {
 
 Review comment:
   Not something we need to fix, but I was thinking about this and wanted to 
note it: we may be able to do better than this for small ranges. For example, 
if the set is `Set(2, 3, 4, 5)` and the lower/upper bounds are `[2, 4]`, then 
the set does guarantee that all values are present. We could test this by 
detecting a small range and testing whether each value in the range is in the 
set.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to