jun-he commented on a change in pull request #1141:
URL: https://github.com/apache/iceberg/pull/1141#discussion_r567309330
##########
File path: site/docs/spec.md
##########
@@ -218,10 +218,31 @@ Partition specs capture the transform from table data to
partition values. This
| **`month`** | Extract a date or timestamp month, as months from
1970-01-01 | `date`, `timestamp(tz)`
| `int` |
| **`day`** | Extract a date or timestamp day, as days from 1970-01-01
| `date`, `timestamp(tz)`
| `date` |
| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01
00:00:00 | `timestamp(tz)`
| `int` |
+| **`void`** | Always produces `null` (the void transform)
| Any
| Source type |
All transforms must return `null` for a `null` input value.
+#### Partition Field ID handling
+
+A partition field ID is an integer used to identify a partition field in
Iceberg manifest files.
+Field IDs are required in v2 and optional in v1.
+
+About compatibility between v1 and v2 tables:
+
+* For backward compatibility, if field ids are missing in a table metadata,
the reference implementation will sequentially generate ids for each field
starting at `1000` based on its position in the list of fields.
+* For forward compatibility, if field ids are not supported but present in the
metadata, old versions of the reference implementation will ignore those field
ids and then regenerate an auto-increment field id starting at 1000 for every
partition field.
+
+While working with a v1 table, the v1 partition spec does not require
consistent field IDs and then they are assigned when creating each manifest
file.
+When creating a manifest, each field of the partition spec will be assigned an
ID starting at `1000`, and there is no guarantees about ID reuse across files.
+But as long as the partition spec will not be evolved, IDs will be consistent.
+
+This has a few implications:
+* Older writers may erase partition field IDs when writing to a v1 table. This
does not happen to v2 tables because writers will fail to read or write a v2
table.
+* Metadata tables need consistent field IDs across manifest files. To achieve
it, for v1 tables, please evolve the partition spec according to the
recommendations,
+i.e. don't reorder or delete partition fields; replace fields with with `void`
transform; add new fields to the end. Note that renames are OK and also note
that this does not apply for v2 tables.
Review comment:
Updated it with the reason (i.e. `v2 table tracks and persists the last
assigned partition field id`).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]