rdblue commented on a change in pull request #2952:
URL: https://github.com/apache/iceberg/pull/2952#discussion_r685360464
##########
File path: api/src/main/java/org/apache/iceberg/types/PruneColumns.java
##########
@@ -47,18 +60,21 @@ public Type struct(Types.StructType struct, List<Type>
fieldResults) {
for (int i = 0; i < fieldResults.size(); i += 1) {
Types.NestedField field = fields.get(i);
Type projectedType = fieldResults.get(i);
- if (field.type() == projectedType) {
- // uses identity because there is no need to check structure. if
identity doesn't match
- // then structure should not either.
- selectedFields.add(field);
- } else if (projectedType != null) {
- sameTypes = false; // signal that some types were altered
- if (field.isOptional()) {
- selectedFields.add(
- Types.NestedField.optional(field.fieldId(), field.name(),
projectedType, field.doc()));
- } else {
- selectedFields.add(
- Types.NestedField.required(field.fieldId(), field.name(),
projectedType, field.doc()));
+ boolean includeField = !onlySelected ||
selected.contains(field.fieldId());
Review comment:
I don't think this is right. I think that the only thing we need to
change is the logic for selecting a struct itself, in `field`. If a field makes
it to this point because it is in `fieldResults` then it (or one of its
children) was selected.
For example, if the schema is `1: id bigint, 2: location struct<3: lat
float, 4: long float, 5: alt float>` and `selected` includes `{3, 4}` then the
schema returned must include both `lat` and `long` fields, even if `location`
is explicitly selected and empty structs should be returned.
The behavior difference is in the case of `selected={2, 3, 4}`:
* When selecting empty structs, the result should be `2: location struct<3:
lat float, 4: long float>`
* When selecting full structs, the result should be `2: location struct<3:
lat float, 4: long float, 5: alt float>`
Similarly, if `selected={2}` the results should be:
* When selecting empty structs, the result should be `2: location struct<>`
* When selecting full structs, the result should be `2: location struct<3:
lat float, 4: long float, 5: alt float>`
So I don't think we need to check `onlySelected` when considering fields.
This should not be a way to prune explicitly selected fields.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]