szlta commented on PR #4662:
URL: https://github.com/apache/iceberg/pull/4662#issuecomment-1125098416

   @szehon-ho I think we were probably talking about the same thing for 
generating no new ID for a PartitionField that was already seen. I have amended 
my change based on this. I looked into this question in some detail and have 
found that this is actually kind of already happening when we are removing part 
of the PartitionFields:
   
   - we start with
   `spec0: 1000: identity(a), 1001: identity(b)`
   - then if update the spec so that only `b` remains, `a` doesn't get 
reassigned a new fieldID, although we do create a new spec:
   `spec1: 1000: identity(a)`
   - now if we re-add `b` and `c`, then `b` would get assigned ID 1002 and `c` 
would get ID 1003
   - I propose we look through in the past specs to see if `b` as partition 
field already participated (based on transform function and source IDs) and 
recycle the old ID, so we end up with:
   `spec2: 1000: identity(a), 1001: identity(b), 1002: identity(c)`
   
   Looking at the tests this should work, there's only one failure I get now in 
`TestMetadataTableScans.testFilesTableScanWithDroppedPartition`:
   ```
   Expected :struct<1000: data_bucket: optional int, 1001: data_bucket_16: 
optional int, 1002: data_trunc_2: optional string>
   Actual   :struct<1000: data_bucket_16: optional int, 1001: data_trunc_2: 
optional string>
   ```
   
   This made me think though.. it seems like we have two different ways of 
generating the partition field column name:
   
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java#L460
 generates `data_bucket_16`, while
   
https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L484
 generates `data_bucket`. Does anyone know why this discrepancy exists? My gut 
feeling is that the former would be the correct way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to