rdblue opened a new issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280 Partition fields are assigned IDs for when they are stored in manifest files. ID assignment is done in [`PartitionSpec#partitionType()`](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L104-L117). That assigns IDs for each field starting at 1000. This assignment scheme reuses IDs across partition specs. Because a manifest file is written for a single partition spec, this doesn't cause problems when multiple specs exist. But this causes problems in the `entries` and `files` metadata tables because the data file partition may have a different schema across manifest files, but reuse IDs. For example, if part of a table is partitioned by `days(ts)` and another part is partitioned by `hours(ts)`, both of these will show up in the `entries` table's `partition` struct with ID 1000. A simple solution is to assign partition field IDs starting at 1000 across all table specs and keep the last assigned ID in table metadata. This would ensure that partition tuples will be read correctly in metadata tables when a table has multiple partition specs. In the example above, `days(ts)` would be assigned ID 1000, and when the second partition spec is added, `hours(ts)` is assigned ID 1001.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
