rdblue commented on a change in pull request #2953:
URL: https://github.com/apache/iceberg/pull/2953#discussion_r707615179



##########
File path: api/src/main/java/org/apache/iceberg/types/TypeUtil.java
##########
@@ -130,13 +130,13 @@ public static Schema select(Schema schema, Set<Integer> 
fieldIds) {
   public static Types.StructType selectNot(Types.StructType struct, 
Set<Integer> fieldIds) {
     Set<Integer> projectedIds = getIdsInternal(struct);
     projectedIds.removeAll(fieldIds);
-    return select(struct, projectedIds);
+    return project(struct, projectedIds);

Review comment:
       I think I agree with the decision to not change the behavior of this 
method, even though the opposite of "select" behavior would be to fully remove 
a struct when its ID is passed in `fieldIds`.
   
   But I don't think that `project` is quite correct either. Consider the 
example schema `1: id bigint, 2: location struct<3: lat double, 4: long 
double>`. Previously, `selectNot(t, set(3, 4))` would produce `1: id bigint` 
and omit the location entirely. Using project with the updated 
`GetProjectedIds`, the projected ID set will be {1, 2, 3, 4} and not `{1, 3, 
4}`. That would result in the same call producing `1: id bigint, 2: location 
struct<>`, which introduces a new bug because now there is an unexpected extra 
field.
   
   To clean this up, I think we need a version of `GetProjectedIds` that 
doesn't select structs and uses the old behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to