rdblue commented on code in PR #15630:
URL: https://github.com/apache/iceberg/pull/15630#discussion_r3025110907


##########
format/spec.md:
##########
@@ -921,33 +967,33 @@ The atomic operation used to commit metadata depends on 
how tables are tracked a
 
 Table metadata consists of the following fields:
 
-| v1         | v2         | v3         | Field                       | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                                 |
-| ---------- | ---------- 
|------------|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| _required_ | _required_ | _required_ | **`format-version`**        | An 
integer version number for the format. Implementations must throw an exception 
if a table's version is higher than the supported version.                      
                                                                                
                                                                                
              |
-| _optional_ | _required_ | _required_ | **`table-uuid`**            | A UUID 
that identifies the table, generated when the table is created. Implementations 
must throw an exception if a table's UUID does not match the expected UUID 
after refreshing metadata.                                                      
                                                                                
                                                               |
-| _required_ | _required_ | _required_ | **`location`**              | The 
table's base location. This is used by writers to determine where to store data 
files, manifest files, and table metadata files.                                
                                                                                
                                                                                
                                                             |
-|            | _required_ | _required_ | **`last-sequence-number`**  | The 
table's highest assigned sequence number, a monotonically increasing long that 
tracks the order of snapshots in a table.                                       
                                                                                
                                                                                
                                                              |
-| _required_ | _required_ | _required_ | **`last-updated-ms`**       | 
Timestamp in milliseconds from the unix epoch when the table was last updated. 
Each table metadata file should update this field just before writing.          
                                                                                
                                                                                
                                                                  |
-| _required_ | _required_ | _required_ | **`last-column-id`**        | An 
integer; the highest assigned column ID for the table. This is used to ensure 
columns are always assigned an unused ID when evolving schemas.                 
                                                                                
                                                                                
                                                                |
-| _required_ |            |            | **`schema`**                | The 
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` 
instead)                                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _required_ | _required_ | **`schemas`**               | A list 
of schemas, stored as objects with `schema-id`.                                 
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`current-schema-id`**     | ID of 
the table's current schema.                                                     
                                                                                
                                                                                
                                                                                
                                                           |
-| _required_ |            |            | **`partition-spec`**        | The 
table’s current partition spec, stored as only fields. Note that this is used 
by writers to partition data, but is not used when reading because reads use 
the specs stored in manifest files. (**Deprecated**: use `partition-specs` and 
`default-spec-id` instead)                                                      
                                                                   |
-| _optional_ | _required_ | _required_ | **`partition-specs`**       | A list 
of partition specs, stored as full partition spec objects.                      
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-spec-id`**       | ID of 
the "current" spec that writers should use by default.                          
                                                                                
                                                                                
                                                                                
                                                           |
-| _optional_ | _required_ | _required_ | **`last-partition-id`**     | An 
integer; the highest assigned partition field ID across all partition specs for 
the table. This is used to ensure partition fields are always assigned an 
unused ID when evolving specs.                                                  
                                                                                
                                                                    |
-| _optional_ | _optional_ | _optional_ | **`properties`**            | A 
string to string map of table properties. This is used to control settings that 
affect reading and writing and is not intended to be used for arbitrary 
metadata. For example, `commit.retry.num-retries` is used to control the number 
of commit retries.                                                              
                                                                       |
-| _optional_ | _optional_ | _optional_ | **`current-snapshot-id`**   | `long` 
ID of the current table snapshot; must be the same as the current ID of the 
`main` branch in `refs`.                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _optional_ | _optional_ | **`snapshots`**             | A list 
of valid snapshots. Valid snapshots are snapshots for which all data files 
exist in the file system. A data file must not be deleted from the file system 
until the last snapshot in which it was listed is garbage collected.            
                                                                                
                                                                |
-| _optional_ | _optional_ | _optional_ | **`snapshot-log`**          | A list 
(optional) of timestamp and snapshot ID pairs that encodes changes to the 
current snapshot for the table. Each time the current-snapshot-id is changed, a 
new entry should be added with the last-updated-ms and the new 
current-snapshot-id. When snapshots are expired from the list of valid 
snapshots, all entries before a snapshot that has expired should be removed.    
          |
-| _optional_ | _optional_ | _optional_ | **`metadata-log`**          | A list 
(optional) of timestamp and metadata file location pairs that encodes changes 
to the previous metadata files for the table. Each time a new metadata file is 
created, a new entry of the previous metadata file location should be added to 
the list. Tables can be configured to remove oldest metadata log entries and 
keep a fixed-size log of the most recent entries after a commit. |
-| _optional_ | _required_ | _required_ | **`sort-orders`**           | A list 
of sort orders, stored as full sort order objects.                              
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-sort-order-id`** | Default 
sort order id of the table. Note that this could be used by writers, but is not 
used when reading because reads use the specs stored in manifest files.         
                                                                                
                                                                                
                                                         |
-|            | _optional_ | _optional_ | **`refs`**                  | A map 
of snapshot references. The map keys are the unique snapshot reference names in 
the table, and the map values are snapshot reference objects. There is always a 
`main` branch reference pointing to the `current-snapshot-id` even if the 
`refs` map is null.                                                             
                                                                 |
-| _optional_ | _optional_ | _optional_ | **`statistics`**            | A list 
(optional) of [table statistics](#table-statistics).                            
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _optional_ | _optional_ | **`partition-statistics`**  | A list 
(optional) of [partition statistics](#partition-statistics).                    
                                                                                
                                                                                
                                                                                
                                                          |
-|            |            | _required_ | **`next-row-id`**           | A 
`long` higher than all assigned row IDs; the next snapshot's `first-row-id`. 
See [Row Lineage](#row-lineage).                                                
                                                                                
                                                                                
                                                                  |
-|            |            | _optional_ | **`encryption-keys`**       | A list 
(optional) of [encryption keys](#encryption-keys) used for table encryption. |
+| v1         | v2         | v3         | v4         | Field                    
   | Description                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
+| ---------- | ---------- 
|------------|------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | _required_ | _required_ | **`format-version`**     
   | An integer version number for the format. Implementations must throw an 
exception if a table's version is higher than the supported version.            
                                                                                
                                                                                
                                                                                
                                                                                
                         |
+| _optional_ | _required_ | _required_ | _required_ | **`table-uuid`**         
   | A UUID that identifies the table, generated when the table is created. 
Implementations must throw an exception if a table's UUID does not match the 
expected UUID after refreshing metadata.                                        
                                                                                
                                                                                
                                                                                
                             |
+| _required_ | _required_ | _required_ | _optional_ | **`location`**           
   | The table's base location. This is used by writers to determine where to 
store data files, manifest files, and table metadata files. Must be an absolute 
path when present. In v4 and later, this field is optional; when omitted, the 
table location is provided (see [Table Location 
Specification](#table-location-specification)).                                 
                                                                                
                                                          |
+|            | _required_ | _required_ | _required_ | 
**`last-sequence-number`**  | The table's highest assigned sequence number, a 
monotonically increasing long that tracks the order of snapshots in a table.    
                                                                                
                                                                                
                                                                                
                                                                                
                                                 |
+| _required_ | _required_ | _required_ | _required_ | **`last-updated-ms`**    
   | Timestamp in milliseconds from the unix epoch when the table was last 
updated. Each table metadata file should update this field just before writing. 
                                                                                
                                                                                
                                                                                
                                                                                
                           |
+| _required_ | _required_ | _required_ | _required_ | **`last-column-id`**     
   | An integer; the highest assigned column ID for the table. This is used to 
ensure columns are always assigned an unused ID when evolving schemas.          
                                                                                
                                                                                
                                                                                
                                                                                
                       |
+| _required_ |            |            |            | **`schema`**             
   | The table’s current schema. (**Deprecated**: use `schemas` and 
`current-schema-id` instead)                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                  |
+| _optional_ | _required_ | _required_ | _required_ | **`schemas`**            
   | A list of schemas, stored as objects with `schema-id`.                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
+| _optional_ | _required_ | _required_ | _required_ | **`current-schema-id`**  
   | ID of the table's current schema.                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
+| _required_ |            |            |            | **`partition-spec`**     
   | The table’s current partition spec, stored as only fields. Note that this 
is used by writers to partition data, but is not used when reading because 
reads use the specs stored in manifest files. (**Deprecated**: use 
`partition-specs` and `default-spec-id` instead)                                
                                                                                
                                                                                
                                         |
+| _optional_ | _required_ | _required_ | _required_ | **`partition-specs`**    
   | A list of partition specs, stored as full partition spec objects.          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
+| _optional_ | _required_ | _required_ | _required_ | **`default-spec-id`**    
   | ID of the "current" spec that writers should use by default.               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                      |
+| _optional_ | _required_ | _required_ | _required_ | **`last-partition-id`**  
   | An integer; the highest assigned partition field ID across all partition 
specs for the table. This is used to ensure partition fields are always 
assigned an unused ID when evolving specs.                                      
                                                                                
                                                                                
                                                                                
                                |
+| _optional_ | _optional_ | _optional_ | _optional_ | **`properties`**         
   | A string to string map of table properties. This is used to control 
settings that affect reading and writing and is not intended to be used for 
arbitrary metadata. For example, `commit.retry.num-retries` is used to control 
the number of commit retries.                                                   
                                                                                
                                                                                
                                  |
+| _optional_ | _optional_ | _optional_ | _optional_ | 
**`current-snapshot-id`**   | `long` ID of the current table snapshot; must be 
the same as the current ID of the `main` branch in `refs`.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                |
+| _optional_ | _optional_ | _optional_ | _optional_ | **`snapshots`**          
   | A list of valid snapshots. Valid snapshots are snapshots for which all 
data files exist in the file system. A data file must not be deleted from the 
file system until the last snapshot in which it was listed is garbage 
collected.                                                                      
                                                                                
                                                                                
                                      |
+| _optional_ | _optional_ | _optional_ | _optional_ | **`snapshot-log`**       
   | A list (optional) of timestamp and snapshot ID pairs that encodes changes 
to the current snapshot for the table. Each time the current-snapshot-id is 
changed, a new entry should be added with the last-updated-ms and the new 
current-snapshot-id. When snapshots are expired from the list of valid 
snapshots, all entries before a snapshot that has expired should be removed.    
                                                                                
                                          |
+| _optional_ | _optional_ | _optional_ | _optional_ | **`metadata-log`**       
   | A list (optional) of timestamp and metadata file location pairs that 
encodes changes to the previous metadata files for the table. Each time a new 
metadata file is created, a new entry of the previous metadata file location 
should be added to the list. Tables can be configured to remove oldest metadata 
log entries and keep a fixed-size log of the most recent entries after a 
commit. In v4, metadata file locations may be relative paths resolved against 
the table location (see [Paths](#paths)). |

Review Comment:
   It's also a little surprising that this is updated to note relative/absolute 
handling, but other places in the spec aren't. I am fine with not updating each 
path/location, or linking to a section on file locations consistently. I would 
definitely have a section on file locations in the spec section and in that 
section discuss the difference between v3 and earlier ("must be 
fully-qualified") and v4 and later (must follow absolute/relative requirements).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to