szlta commented on PR #4662: URL: https://github.com/apache/iceberg/pull/4662#issuecomment-1125098416
@szehon-ho I think we were probably talking about the same thing for generating no new ID for a PartitionField that was already seen. I have amended my change based on this. I looked into this question in some detail and have found that this is actually kind of already happening when we are removing part of the PartitionFields: - we start with `spec0: 1000: identity(a), 1001: identity(b)` - then if update the spec so that only `b` remains, `a` doesn't get reassigned a new fieldID, although we do create a new spec: `spec1: 1000: identity(a)` - now if we re-add `b` and `c`, then `b` would get assigned ID 1002 and `c` would get ID 1003 - I propose we look through in the past specs to see if `b` as partition field already participated (based on transform function and source IDs) and recycle the old ID, so we end up with: `spec2: 1000: identity(a), 1001: identity(b), 1002: identity(c)` Looking at the tests this should work, there's only one failure I get now in `TestMetadataTableScans.testFilesTableScanWithDroppedPartition`: ``` Expected :struct<1000: data_bucket: optional int, 1001: data_bucket_16: optional int, 1002: data_trunc_2: optional string> Actual :struct<1000: data_bucket_16: optional int, 1001: data_trunc_2: optional string> ``` This made me think though.. it seems like we have two different ways of generating the partition field column name: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java#L460 generates `data_bucket_16`, while https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L484 generates `data_bucket`. Does anyone know why this discrepancy exists? My gut feeling is that the former would be the correct way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
