rdblue commented on a change in pull request #2952:
URL: https://github.com/apache/iceberg/pull/2952#discussion_r685370873
##########
File path: api/src/main/java/org/apache/iceberg/types/PruneColumns.java
##########
@@ -77,7 +95,12 @@ public Type struct(Types.StructType struct, List<Type>
fieldResults) {
@Override
public Type field(Types.NestedField field, Type fieldResult) {
if (selected.contains(field.fieldId())) {
- return field.type();
+ if (onlySelected) {
+ // This may be an empty struct, if it is use the fieldResult which
will be an emptyStruct
+ return fieldResult == null ? field.type() : fieldResult;
+ } else {
+ return field.type();
+ }
Review comment:
Given my comment above about field selection behavior, I think that this
only needs to change the behavior when `selectEmptyStructs` is true (explicitly
selected structs are empty, unless some fields are also selected) and
`fieldResult` is `null` (no fields were selected).
I think it should be:
```java
if (selectEmptyStructs && field.type().isStructType()) {
// the struct was selected, ensure at least an empty struct is
returned
if (fieldResult == null) {
// no sub-fields were selected but the struct was, return an empty
struct
return Types.StructType.of();
} else {
// sub-fields were selected so return the projected struct
return fieldResult;
}
} else {
// selecting full structs and the struct was explicitly selected, so
ignore fieldResult and return the original
return field.type();
}
```
This avoids modifying `struct` to return an empty struct since its behavior
is correct and we just need to replace `null` with an empty struct.
This also changes only the behavior for structs because it is checking
`field.type().isStructType()`. I think there is an open question about how to
handle maps and lists that are explicitly selected. I had been assuming that
the config boolean would change behavior for only structs, but it seems strange
to have different behavior for maps and lists. What should happen when
selecting a list where the element is a struct? What if `2: locations list<3:
struct<...>>` and `selected={2}`? Should this select `list<struct<>>` or
`list<struct<...>>`?
My current thinking is that `project` should fail if inner IDs are selected
and `selectEmptyStructs` is `true`. Inner IDs should be limited to structs only
to avoid the case where a map or list is explicitly selected. But those cases
must be allowed when selecting full types. As a result, I think I would change
the boolean option to `selectFullTypes` instead. That way it is clear that it
applies to all types and _not_ just structs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]