rdblue opened a new issue #280: Add persistent IDs to partition fields
URL: https://github.com/apache/incubator-iceberg/issues/280
 
 
   Partition fields are assigned IDs for when they are stored in manifest 
files. ID assignment is done in 
[`PartitionSpec#partitionType()`](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L104-L117).
 That assigns IDs for each field starting at 1000.
   
   This assignment scheme reuses IDs across partition specs. Because a manifest 
file is written for a single partition spec, this doesn't cause problems when 
multiple specs exist. But this causes problems in the `entries` and `files` 
metadata tables because the data file partition may have a different schema 
across manifest files, but reuse IDs.
   
   For example, if part of a table is partitioned by `days(ts)` and another 
part is partitioned by `hours(ts)`, both of these will show up in the `entries` 
table's `partition` struct with ID 1000.
   
   A simple solution is to assign partition field IDs starting at 1000 across 
all table specs and keep the last assigned ID in table metadata. This would 
ensure that partition tuples will be read correctly in metadata tables when a 
table has multiple partition specs. In the example above, `days(ts)` would be 
assigned ID 1000, and when the second partition spec is added, `hours(ts)` is 
assigned ID 1001.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to