prodeezy commented on a change in pull request #123: Add support for struct 
field based filtering
URL: https://github.com/apache/incubator-iceberg/pull/123#discussion_r283579316
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/expressions/BoundReference.java
 ##########
 @@ -19,57 +19,238 @@
 
 package org.apache.iceberg.expressions;
 
+import java.io.Serializable;
 import java.util.List;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+import org.apache.iceberg.Schema;
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public class BoundReference<T> implements Reference {
   private final int fieldId;
-  private final Type type;
+  private final Accessor<StructLike> accessor;
   private final int pos;
 
-  BoundReference(Types.StructType struct, int fieldId) {
+  BoundReference(Schema schema, int fieldId) {
     this.fieldId = fieldId;
-    this.pos = find(fieldId, struct);
-    this.type = struct.fields().get(pos).type();
+
+    Map<Integer, Accessor<StructLike>> accessors = lazyIdToAccessor(schema);
+
+    this.accessor = accessors.get(fieldId);
+
+    // only look for top level field position
+    this.pos = findTopFieldPos(fieldId, schema.asStruct());
+
   }
 
-  private int find(int fieldId, Types.StructType struct) {
+
+  private int findTopFieldPos(int fieldId, Types.StructType struct) {
     List<Types.NestedField> fields = struct.fields();
     for (int i = 0; i < fields.size(); i += 1) {
       if (fields.get(i).fieldId() == fieldId) {
         return i;
       }
     }
-    throw new ValidationException(
-        "Cannot find top-level field id %d in struct: %s", fieldId, struct);
+    return -1;
   }
 
   public Type type() {
-    return type;
+    return accessor.type();
   }
 
   public int fieldId() {
     return fieldId;
   }
 
   public int pos() {
+    if (pos == -1) {
 
 Review comment:
   Need some clarification here .. The evaluator in question is the 
`InclusiveManifestEvaluator` which evaluates partition fields for matching 
manifests. So the `ref.pos()` is going to be on partition fields.  Afaik 
partition stats are kept separately in snapshot files and regular fields wont 
show up in this partition summary list.  Is this an issue when filtering on a 
nested field in the schema? If so,  Is the partition source id used always to 
reference that field in schema? Can I assume this for logical partitions as 
well?  
   
   e.g. This is a partition summary in snapshot which is used in the said 
evaluator.  This is separate from the stats kept on data schema fields.
   ```
   "partitions": {
       "array": [
         {
           "contains_null": true,
           "lower_bound": {
             "bytes": "\u0013\u0000\u0000\u0000"
           },
           "upper_bound": {
             "bytes": "\u001e\u0000\u0000\u0000"
           }
         }
       ]
     }
   ``` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to