rdblue commented on a change in pull request #123: Add support for struct field 
based filtering
URL: https://github.com/apache/incubator-iceberg/pull/123#discussion_r283583467
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/expressions/BoundReference.java
 ##########
 @@ -19,57 +19,238 @@
 
 package org.apache.iceberg.expressions;
 
+import java.io.Serializable;
 import java.util.List;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+import org.apache.iceberg.Schema;
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public class BoundReference<T> implements Reference {
   private final int fieldId;
-  private final Type type;
+  private final Accessor<StructLike> accessor;
   private final int pos;
 
-  BoundReference(Types.StructType struct, int fieldId) {
+  BoundReference(Schema schema, int fieldId) {
     this.fieldId = fieldId;
-    this.pos = find(fieldId, struct);
-    this.type = struct.fields().get(pos).type();
+
+    Map<Integer, Accessor<StructLike>> accessors = lazyIdToAccessor(schema);
+
+    this.accessor = accessors.get(fieldId);
+
+    // only look for top level field position
+    this.pos = findTopFieldPos(fieldId, schema.asStruct());
+
   }
 
-  private int find(int fieldId, Types.StructType struct) {
+
+  private int findTopFieldPos(int fieldId, Types.StructType struct) {
     List<Types.NestedField> fields = struct.fields();
     for (int i = 0; i < fields.size(); i += 1) {
       if (fields.get(i).fieldId() == fieldId) {
         return i;
       }
     }
-    throw new ValidationException(
-        "Cannot find top-level field id %d in struct: %s", fieldId, struct);
+    return -1;
   }
 
   public Type type() {
-    return type;
+    return accessor.type();
   }
 
   public int fieldId() {
     return fieldId;
   }
 
   public int pos() {
+    if (pos == -1) {
 
 Review comment:
   I think you're saying that this is technically safe, and that's correct. The 
only code path that calls this that evaluator and it is always binding to a 
flat partition structure. That's why it can use the position: it knows that the 
array of partition summaries is in the same order as a tuple of partition 
values.
   
   My point here is that it is brittle to bind to a struct type and use the 
position for something else, and also that it is a bad API to expose the 
position when no normal path uses the position directly. Instead, maybe that 
evaluator should get this position from the first accessor. That way, it 
validates that the partition field is not nested (should be a single position 
accessor). My original thought was to add a method to the accessor that can 
return one of the partition summaries from a list. That would work, too, but 
requires another accessor method so it isn't a great idea.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to