rdblue commented on a change in pull request #2952:
URL: https://github.com/apache/iceberg/pull/2952#discussion_r685370873



##########
File path: api/src/main/java/org/apache/iceberg/types/PruneColumns.java
##########
@@ -77,7 +95,12 @@ public Type struct(Types.StructType struct, List<Type> 
fieldResults) {
   @Override
   public Type field(Types.NestedField field, Type fieldResult) {
     if (selected.contains(field.fieldId())) {
-      return field.type();
+      if (onlySelected) {
+        // This may be an empty struct, if it is use the fieldResult which 
will be an emptyStruct
+        return fieldResult == null ? field.type() : fieldResult;
+      } else {
+        return field.type();
+      }

Review comment:
       Given my comment above about field selection behavior, I think that this 
only needs to change the behavior when `selectEmptyStructs` is true (explicitly 
selected structs are empty, unless some fields are also selected) and 
`fieldResult` is `null` (no fields were selected).
   
   I think it should be:
   
   ```java
         if (selectEmptyStructs && field.type().isStructType()) {
           // the struct was selected, ensure at least an empty struct is 
returned
           if (fieldResult == null) {
             // no sub-fields were selected but the struct was, return an empty 
struct
             return Types.StructType.of();
           } else {
             // sub-fields were selected so return the projected struct
             return fieldResult;
           }
         } else {
           // selecting full structs and the struct was explicitly selected, so 
ignore fieldResult and return the original
           return field.type();
         }
   ```
   
   This avoids modifying `struct` to return an empty struct since its behavior 
is correct and we just need to replace `null` with an empty struct.
   
   This also changes only the behavior for structs because it is checking 
`field.type().isStructType()`. I think there is an open question about how to 
handle maps and lists that are explicitly selected. I had been assuming that 
the config boolean would change behavior for only structs, but it seems strange 
to have different behavior for maps and lists. What should happen when 
selecting a list where the element is a struct? What if `2: locations list<3: 
struct<...>>` and `selected={2}`? Should this select `list<struct<>>` or 
`list<struct<...>>`?
   
   My current thinking is that `project` should fail if inner IDs are selected 
and `selectEmptyStructs` is `true`. Inner IDs should be limited to structs only 
to avoid the case where a map or list is explicitly selected. But those cases 
must be allowed when selecting full types. As a result, I think I would change 
the boolean option to `selectFullTypes` instead. That way it is clear that it 
applies to all types and _not_ just structs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to