sriharshaj commented on code in PR #12634: URL: https://github.com/apache/iceberg/pull/12634#discussion_r2353883644
########## parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java: ########## @@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, List<Type> fields) { Type originalField = struct.getType(i); Type field = fields.get(i); Integer fieldId = getId(originalField); - if (fieldId != null && selectedIds.contains(fieldId)) { - filteredFields.add(originalField); - } else if (field != null) { - filteredFields.add(originalField); + if (field != null) { + filteredFields.add(field); hasChange = true; + } else if (fieldId != null && selectedIds.contains(fieldId)) { Review Comment: @rdblue From my reading of the code, setting `includeStructIds` to false doesn’t consistently add the struct field ID to `selectedIds`. As a result, `(fieldId != null && selectedIds.contains(fieldId))` may never evaluate to true. Also, given the current implementation, when we only look at `expectedSchema` in: ``` TypeUtil.getProjectedIds(expectedSchema); ``` it seems we can’t determine whether a subset of struct fields is selected or the entire struct is selected, since there’s no table schema to diff against. In addition, even if only one nested field of a struct is selected, it looks like `org.apache.iceberg.Schema` can’t be built without the parent struct field ID. One possible way to distinguish "entire struct selected" from "subset selected" would be to omit the struct’s nested fields from the `Schema` when the full struct is selected. Please let me know if I am missing anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org