rdblue commented on a change in pull request #1141:
URL: https://github.com/apache/iceberg/pull/1141#discussion_r552210506



##########
File path: site/docs/spec.md
##########
@@ -218,10 +218,27 @@ Partition specs capture the transform from table data to 
partition values. This
 | **`month`**       | Extract a date or timestamp month, as months from 
1970-01-01 | `date`, `timestamp(tz)`                                            
                                       | `int`       |
 | **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp(tz)`                                                   
                                | `date`      |
 | **`hour`**        | Extract a timestamp hour, as hours from 1970-01-01 
00:00:00  | `timestamp(tz)`                                                     
                                      | `int`       |
+| **`void`**        | Always produces `null` (the void transform)              
    | Any                                                                       
                                | `null`      |
 
 All transforms must return `null` for a `null` input value.
 
 
+#### Partition Field ID handling
+
+A partition field ID is an integer used to identify a partition field. 
+Field IDs are required in v2 and optional in v1.
+
+About compatibility between v1 and v2 tables:
+
+* For backward compatibility, if field ids are missing in a table metadata, 
iceberg will sequentially generate ids for each field starting at 1000 based on 
its position in the list of fields.
+* For forward compatibility, if field ids are not supported but present in the 
metadata, old versions of the reference implementation will ignore those field 
ids and then regenerate an auto-increment field id starting at 1000 for every 
partition field.
+
+While working with a v1 table, field IDs might be reused if removing partition 
fields from its partition spec. 

Review comment:
       I think what this is trying to say is that old versions of the reference 
implementation did not keep track of field IDs. When creating a manifest, each 
field of the partition spec was assigned an ID starting at 1,000, and there 
were no guarantees about ID reuse across files. But as long as the spec was not 
evolved, IDs were consistent.
   
   That has a few implications:
   
   1. Older writers may erase partition field IDs when writing to a v1 table. 
This does not happen to v2 tables because writers will fail to read or write a 
v2 table.
   2. Metadata tables need consistent field IDs across manifest files. To do 
this for v1 tables evolve the spec according to the recommendations (don't 
delete, replace with void; only add to the end; renames are okay).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to