stevenzwu commented on code in PR #14656:
URL: https://github.com/apache/iceberg/pull/14656#discussion_r3197132965


##########
format/spec.md:
##########
@@ -790,33 +793,34 @@ A manifest list is a valid Iceberg data file: files must 
use valid Iceberg forma
 
 Manifest list files store `manifest_file`, a struct with the following fields:
 
-| v1         | v2         | v3         | Field id, name                   | 
Type                                        | Description                       
                                                                                
                                   |
-| ---------- | ---------- 
|------------|----------------------------------|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
-| _required_ | _required_ | _required_ | **`500 manifest_path`**          | 
`string`                                    | Location of the manifest file     
                                                                                
                                   |
-| _required_ | _required_ | _required_ | **`501 manifest_length`**        | 
`long`                                      | Length of the manifest file in 
bytes                                                                           
                                      |
-| _required_ | _required_ | _required_ | **`502 partition_spec_id`**      | 
`int`                                       | ID of a partition spec used to 
write the manifest; must be listed in table metadata `partition-specs`          
                                      |
-|            | _required_ | _required_ | **`517 content`**                | 
`int` with meaning: `0: data`, `1: deletes` | The type of files tracked by the 
manifest, either data or delete files; 0 for all v1 manifests                   
                                    |
-|            | _required_ | _required_ | **`515 sequence_number`**        | 
`long`                                      | The sequence number when the 
manifest was added to the table; use 0 when reading v1 manifest lists           
                                        |
-|            | _required_ | _required_ | **`516 min_sequence_number`**    | 
`long`                                      | The minimum data sequence number 
of all live data or delete files in the manifest; use 0 when reading v1 
manifest lists                              |
-| _required_ | _required_ | _required_ | **`503 added_snapshot_id`**      | 
`long`                                      | ID of the snapshot where the  
manifest file was added                                                         
                                       |
-| _optional_ | _required_ | _required_ | **`504 added_files_count`**      | 
`int`                                       | Number of entries in the manifest 
that have status `ADDED` (1), when `null` this is assumed to be non-zero        
                                   |
-| _optional_ | _required_ | _required_ | **`505 existing_files_count`**   | 
`int`                                       | Number of entries in the manifest 
that have status `EXISTING` (0), when `null` this is assumed to be non-zero     
                                   |
-| _optional_ | _required_ | _required_ | **`506 deleted_files_count`**    | 
`int`                                       | Number of entries in the manifest 
that have status `DELETED` (2), when `null` this is assumed to be non-zero      
                                   |
-| _optional_ | _required_ | _required_ | **`512 added_rows_count`**       | 
`long`                                      | Number of rows in all of files in 
the manifest that have status `ADDED`, when `null` this is assumed to be 
non-zero                                  |
-| _optional_ | _required_ | _required_ | **`513 existing_rows_count`**    | 
`long`                                      | Number of rows in all of files in 
the manifest that have status `EXISTING`, when `null` this is assumed to be 
non-zero                               |
-| _optional_ | _required_ | _required_ | **`514 deleted_rows_count`**     | 
`long`                                      | Number of rows in all of files in 
the manifest that have status `DELETED`, when `null` this is assumed to be 
non-zero                                |
-| _optional_ | _optional_ | _optional_ | **`507 partitions`**             | 
`list<508: field_summary>` (see below)      | A list of field summaries for 
each partition field in the spec. Each field in the list corresponds to a field 
in the manifest file’s partition spec. |
-| _optional_ | _optional_ | _optional_ | **`519 key_metadata`**           | 
`binary`                                    | Implementation-specific key 
metadata for encryption                                                         
                                         |
-|            |            | _optional_ | **`520 first_row_id`**           | 
`long`                                      | The starting `_row_id` to assign 
to rows added by `ADDED` data files [First Row ID 
Assignment](#first-row-id-assignment)                               |
+=== "v1 - v3"
+    | v1         | v2         | v3         | Field id, name                    
  | Type                                        | Description |
+    | ---------- | ---------- 
|------------|-------------------------------------|---------------------------------------------|-------------|
+    | _required_ | _required_ | _required_ | **`500 manifest_path`**           
  | `string`                                    | Location of the manifest file 
|
+    | _required_ | _required_ | _required_ | **`501 manifest_length`**         
  | `long`                                      | Length of the manifest file 
in bytes |
+    | _required_ | _required_ | _required_ | **`502 partition_spec_id`**       
  | `int`                                       | ID of a partition spec used 
to write the manifest; must be listed in table metadata `partition-specs` |
+    |            | _required_ | _required_ | **`517 content`**                 
  | `int` with meaning: `0: data`, `1: deletes` | The type of files tracked by 
the manifest, either data or delete files; 0 for all v1 manifests |
+    |            | _required_ | _required_ | **`515 sequence_number`**         
  | `long`                                      | The sequence number when the 
manifest was added to the table; use 0 when reading v1 manifest lists |
+    |            | _required_ | _required_ | **`516 min_sequence_number`**     
  | `long`                                      | The minimum data sequence 
number of all live data or delete files in the manifest; use 0 when reading v1 
manifest lists |
+    | _required_ | _required_ | _required_ | **`503 added_snapshot_id`**       
  | `long`                                      | ID of the snapshot where the 
manifest file was added |
+    | _optional_ | _required_ | _required_ | **`504 added_files_count`**       
  | `int`                                       | Number of entries in the 
manifest that have status `ADDED` (1), when `null` this is assumed to be 
non-zero |
+    | _optional_ | _required_ | _required_ | **`505 existing_files_count`**    
  | `int`                                       | Number of entries in the 
manifest that have status `EXISTING` (0), when `null` this is assumed to be 
non-zero |
+    | _optional_ | _required_ | _required_ | **`506 deleted_files_count`**     
  | `int`                                       | Number of entries in the 
manifest that have status `DELETED` (2), when `null` this is assumed to be 
non-zero |
+    | _optional_ | _required_ | _required_ | **`512 added_rows_count`**        
  | `long`                                      | Number of rows in all of 
files in the manifest that have status `ADDED`, when `null` this is assumed to 
be non-zero |
+    | _optional_ | _required_ | _required_ | **`513 existing_rows_count`**     
  | `long`                                      | Number of rows in all of 
files in the manifest that have status `EXISTING`, when `null` this is assumed 
to be non-zero |
+    | _optional_ | _required_ | _required_ | **`514 deleted_rows_count`**      
  | `long`                                      | Number of rows in all of 
files in the manifest that have status `DELETED`, when `null` this is assumed 
to be non-zero |
+    | _optional_ | _optional_ | _optional_ | **`507 partitions`**              
  | `list<508: field_summary>`                  | A list of field summaries for 
each partition field in the spec. Each field in the list corresponds to a field 
in the manifest file’s partition spec. |

Review Comment:
   Content drop: original Type column was `list<508: field_summary>` **(see 
below)** — the `(see below)` cross-reference to the field_summary table that 
follows on line 818 was removed. Restoring it preserves discoverability.



##########
format/spec.md:
##########
@@ -921,33 +926,34 @@ The atomic operation used to commit metadata depends on 
how tables are tracked a
 
 Table metadata consists of the following fields:
 
-| v1         | v2         | v3         | Field                       | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                                 |
-| ---------- | ---------- 
|------------|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| _required_ | _required_ | _required_ | **`format-version`**        | An 
integer version number for the format. Implementations must throw an exception 
if a table's version is higher than the supported version.                      
                                                                                
                                                                                
              |
-| _optional_ | _required_ | _required_ | **`table-uuid`**            | A UUID 
that identifies the table, generated when the table is created. Implementations 
must throw an exception if a table's UUID does not match the expected UUID 
after refreshing metadata.                                                      
                                                                                
                                                               |
-| _required_ | _required_ | _required_ | **`location`**              | The 
table's base location. This is used by writers to determine where to store data 
files, manifest files, and table metadata files.                                
                                                                                
                                                                                
                                                             |
-|            | _required_ | _required_ | **`last-sequence-number`**  | The 
table's highest assigned sequence number, a monotonically increasing long that 
tracks the order of snapshots in a table.                                       
                                                                                
                                                                                
                                                              |
-| _required_ | _required_ | _required_ | **`last-updated-ms`**       | 
Timestamp in milliseconds from the unix epoch when the table was last updated. 
Each table metadata file should update this field just before writing.          
                                                                                
                                                                                
                                                                  |
-| _required_ | _required_ | _required_ | **`last-column-id`**        | An 
integer; the highest assigned column ID for the table. This is used to ensure 
columns are always assigned an unused ID when evolving schemas.                 
                                                                                
                                                                                
                                                                |
-| _required_ |            |            | **`schema`**                | The 
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` 
instead)                                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _required_ | _required_ | **`schemas`**               | A list 
of schemas, stored as objects with `schema-id`.                                 
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`current-schema-id`**     | ID of 
the table's current schema.                                                     
                                                                                
                                                                                
                                                                                
                                                           |
-| _required_ |            |            | **`partition-spec`**        | The 
table’s current partition spec, stored as only fields. Note that this is used 
by writers to partition data, but is not used when reading because reads use 
the specs stored in manifest files. (**Deprecated**: use `partition-specs` and 
`default-spec-id` instead)                                                      
                                                                   |
-| _optional_ | _required_ | _required_ | **`partition-specs`**       | A list 
of partition specs, stored as full partition spec objects.                      
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-spec-id`**       | ID of 
the "current" spec that writers should use by default.                          
                                                                                
                                                                                
                                                                                
                                                           |
-| _optional_ | _required_ | _required_ | **`last-partition-id`**     | An 
integer; the highest assigned partition field ID across all partition specs for 
the table. This is used to ensure partition fields are always assigned an 
unused ID when evolving specs.                                                  
                                                                                
                                                                    |
-| _optional_ | _optional_ | _optional_ | **`properties`**            | A 
string to string map of table properties. This is used to control settings that 
affect reading and writing and is not intended to be used for arbitrary 
metadata. For example, `commit.retry.num-retries` is used to control the number 
of commit retries.                                                              
                                                                       |
-| _optional_ | _optional_ | _optional_ | **`current-snapshot-id`**   | `long` 
ID of the current table snapshot; must be the same as the current ID of the 
`main` branch in `refs`.                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _optional_ | _optional_ | **`snapshots`**             | A list 
of valid snapshots. Valid snapshots are snapshots for which all data files 
exist in the file system. A data file must not be deleted from the file system 
until the last snapshot in which it was listed is garbage collected.            
                                                                                
                                                                |
-| _optional_ | _optional_ | _optional_ | **`snapshot-log`**          | A list 
(optional) of timestamp and snapshot ID pairs that encodes changes to the 
current snapshot for the table. Each time the current-snapshot-id is changed, a 
new entry should be added with the last-updated-ms and the new 
current-snapshot-id. When snapshots are expired from the list of valid 
snapshots, all entries before a snapshot that has expired should be removed.    
          |
-| _optional_ | _optional_ | _optional_ | **`metadata-log`**          | A list 
(optional) of timestamp and metadata file location pairs that encodes changes 
to the previous metadata files for the table. Each time a new metadata file is 
created, a new entry of the previous metadata file location should be added to 
the list. Tables can be configured to remove oldest metadata log entries and 
keep a fixed-size log of the most recent entries after a commit. |
-| _optional_ | _required_ | _required_ | **`sort-orders`**           | A list 
of sort orders, stored as full sort order objects.                              
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-sort-order-id`** | Default 
sort order id of the table. Note that this could be used by writers, but is not 
used when reading because reads use the specs stored in manifest files.         
                                                                                
                                                                                
                                                         |
-|            | _optional_ | _optional_ | **`refs`**                  | A map 
of snapshot references. The map keys are the unique snapshot reference names in 
the table, and the map values are snapshot reference objects. There is always a 
`main` branch reference pointing to the `current-snapshot-id` even if the 
`refs` map is null.                                                             
                                                                 |
-| _optional_ | _optional_ | _optional_ | **`statistics`**            | A list 
(optional) of [table statistics](#table-statistics).                            
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _optional_ | _optional_ | **`partition-statistics`**  | A list 
(optional) of [partition statistics](#partition-statistics).                    
                                                                                
                                                                                
                                                                                
                                                          |
-|            |            | _required_ | **`next-row-id`**           | A 
`long` higher than all assigned row IDs; the next snapshot's `first-row-id`. 
See [Row Lineage](#row-lineage).                                                
                                                                                
                                                                                
                                                                  |
-|            |            | _optional_ | **`encryption-keys`**       | A list 
(optional) of [encryption keys](#encryption-keys) used for table encryption. |
+=== "v1 - v3"
+    | v1         | v2         | v3         | Field                       | 
Description |
+    | ---------- | ---------- |------------|-----------------------------| 
------------|
+    | _required_ | _required_ | _required_ | **`format-version`**        | An 
integer version number for the format. Implementations must throw an exception 
if a table’s version is higher than the supported version. |
+    | _optional_ | _required_ | _required_ | **`table-uuid`**            | A 
UUID that identifies the table, generated when the table is created. 
Implementations must throw an exception if a table’s UUID does not match the 
expected UUID after refreshing metadata. |
+    | _required_ | _required_ | _required_ | **`location`**              | The 
table’s base location. This is used by writers to determine where to store data 
files, manifest files, and table metadata files. |
+    |            | _required_ | _required_ | **`last-sequence-number`**  | The 
table’s highest assigned sequence number, a monotonically increasing long that 
tracks the order of snapshots in a table. |
+    | _required_ | _required_ | _required_ | **`last-updated-ms`**       | 
Timestamp in milliseconds from the unix epoch when the table was last updated. 
Each table metadata file should update this field just before writing. |
+    | _required_ | _required_ | _required_ | **`last-column-id`**        | An 
integer; the highest assigned column ID for the table. This is used to ensure 
columns are always assigned an unused ID when evolving schemas. |
+    | _required_ |            |            | **`schema`**                | The 
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` 
instead) |
+    | _optional_ | _required_ | _required_ | **`schemas`**               | A 
list of schemas, stored as objects with `schema-id`. |
+    | _optional_ | _required_ | _required_ | **`current-schema-id`**     | ID 
of the table’s current schema. |
+    | _required_ |            |            | **`partition-spec`**        | The 
table’s current partition spec, stored as only fields. (**Deprecated**: use 
`partition-specs` and `default-spec-id` instead) |

Review Comment:
   Content drop: original deprecation description included "Note that this is 
used by writers to partition data, but is not used when reading because reads 
use the specs stored in manifest files." That sentence was removed in this PR. 
It explains *why* the deprecated single `partition-spec` field was safe to 
deprecate (readers never relied on it) and is useful context for anyone 
implementing v1 readers.



##########
format/spec.md:
##########
@@ -643,32 +645,33 @@ Notes:
 
 The `data_file` struct consists of the following fields:
 
-| v1         | v2         | v3         | Field id, name                    | 
Type                                                                        | 
Description                                                                     
                                                                                
                                                   |
-| ---------- 
|------------|------------|-----------------------------------|-----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-|            | _required_ | _required_ | **`134  content`**                | 
`int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | 
Type of content stored by the data file: data, equality deletes, or position 
deletes (all v1 files are data files)                                           
                                                      |
-| _required_ | _required_ | _required_ | **`100  file_path`**              | 
`string`                                                                    | 
Full URI for the file with FS scheme                                            
                                                                                
                                                   |
-| _required_ | _required_ | _required_ | **`101  file_format`**            | 
`string`                                                                    | 
String file format name, `avro`, `orc`, `parquet`, or `puffin`                  
                                                                                
                                                   |
-| _required_ | _required_ | _required_ | **`102  partition`**              | 
`struct<...>`                                                               | 
Partition data tuple, schema based on the partition spec output using partition 
field ids for the struct field ids                                              
                                                   |
-| _required_ | _required_ | _required_ | **`103  record_count`**           | 
`long`                                                                      | 
Number of records in this file, or the cardinality of a deletion vector         
                                                                                
                                                   |
-| _required_ | _required_ | _required_ | **`104  file_size_in_bytes`**     | 
`long`                                                                      | 
Total file size in bytes                                                        
                                                                                
                                                   |
-| _required_ |            |            | ~~**`105 block_size_in_bytes`**~~ | 
`long`                                                                      | 
**Deprecated. Always write a default in v1. Do not write in v2 or v3.**         
                                                                                
                                                   |
-| _optional_ |            |            | ~~**`106  file_ordinal`**~~       | 
`int`                                                                       | 
**Deprecated. Do not write.**                                                   
                                                                                
                                                   |
-| _optional_ |            |            | ~~**`107  sort_columns`**~~       | 
`list<112: int>`                                                            | 
**Deprecated. Do not write.**                                                   
                                                                                
                                                   |
-| _optional_ | _optional_ | _optional_ | **`108  column_sizes`**           | 
`map<117: int, 118: long>`                                                  | 
Map from column id to the total size on disk of all regions that store the 
column. Does not include bytes necessary to read other columns, like footers. 
Leave null for row-oriented formats (Avro)                |
-| _optional_ | _optional_ | _optional_ | **`109  value_counts`**           | 
`map<119: int, 120: long>`                                                  | 
Map from column id to number of values in the column (including null and NaN 
values)                                                                         
                                                      |
-| _optional_ | _optional_ | _optional_ | **`110  null_value_counts`**      | 
`map<121: int, 122: long>`                                                  | 
Map from column id to number of null values in the column                       
                                                                                
                                                   |
-| _optional_ | _optional_ | _optional_ | **`137  nan_value_counts`**       | 
`map<138: int, 139: long>`                                                  | 
Map from column id to number of NaN values in the column                        
                                                                                
                                                   |
-| _optional_ | _optional_ |            | ~~**`111  distinct_counts`**~~    | 
`map<123: int, 124: long>`                                                  | 
**Deprecated. Do not write.**                                                   
                                                                                
                                                   |
-| _optional_ | _optional_ | _optional_ | **`125  lower_bounds`**           | 
`map<126: int, 127: binary>`                                                | 
Map from column id to lower bound in the column serialized as binary [1]. Each 
value must be less than or equal to all non-null, non-NaN values in the column 
for the file [2]                                     |
-| _optional_ | _optional_ | _optional_ | **`128  upper_bounds`**           | 
`map<129: int, 130: binary>`                                                | 
Map from column id to upper bound in the column serialized as binary [1]. Each 
value must be greater than or equal to all non-null, non-Nan values in the 
column for the file [2]                                  |
-| _optional_ | _optional_ | _optional_ | **`131  key_metadata`**           | 
`binary`                                                                    | 
Implementation-specific key metadata for encryption                             
                                                                                
                                                   |
-| _optional_ | _optional_ | _optional_ | **`132  split_offsets`**          | 
`list<133: long>`                                                           | 
Split offsets for the data file. For example, all row group offsets in a 
Parquet file. Must be sorted ascending                                          
                                                          |
-|            | _optional_ | _optional_ | **`135  equality_ids`**           | 
`list<136: int>`                                                            | 
Field ids used to determine row equality in equality delete files. Required 
when `content=2` and should be null otherwise. Fields with ids listed in this 
column must be present in the delete file                |
-| _optional_ | _optional_ | _optional_ | **`140  sort_order_id`**          | 
`int`                                                                       | 
ID representing sort order for this file [3].                                   
                                                                                
                                                   |
-|            |            | _optional_ | **`142  first_row_id`**           | 
`long`                                                                      | 
The `_row_id` for the first row in the data file. See [First Row ID 
Inheritance](#first-row-id-inheritance)                                         
                                                               |
-|            | _optional_ | _optional_ | **`143  referenced_data_file`**   | 
`string`                                                                    | 
Fully qualified location (URI with FS scheme) of a data file that all deletes 
reference [4]                                                                   
                                                     |
-|            |            | _optional_ | **`144  content_offset`**         | 
`long`                                                                      | 
The offset in the file where the content starts [5]                             
                                                                                
                                                   |
-|            |            | _optional_ | **`145  content_size_in_bytes`**  | 
`long`                                                                      | 
The length of a referenced content stored in the file; required if 
`content_offset` is present [5]                                                 
                                                                |
+=== "v1 - v3"
+    | v1         | v2         | v3         | Field id, name                    
| Type                                                                        | 
Description |
+    | ---------- 
|------------|------------|-----------------------------------|-----------------------------------------------------------------------------|-------------|
+    |            | _required_ | _required_ | **`134  content`**                
| `int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | 
Type of content stored by the data file: data, equality deletes, or position 
deletes (all v1 files are data files) |
+    | _required_ | _required_ | _required_ | **`100  file_path`**              
| `string`                                                                    | 
Full URI for the file with FS scheme |
+    | _required_ | _required_ | _required_ | **`101  file_format`**            
| `string`                                                                    | 
String file format name, `avro`, `orc`, `parquet`, or `puffin` |
+    | _required_ | _required_ | _required_ | **`102  partition`**              
| `struct<...>`                                                               | 
Partition data tuple, schema based on the partition spec output using partition 
field ids for the struct field ids |
+    | _required_ | _required_ | _required_ | **`103  record_count`**           
| `long`                                                                      | 
Number of records in this file, or the cardinality of a deletion vector |
+    | _required_ | _required_ | _required_ | **`104  file_size_in_bytes`**     
| `long`                                                                      | 
Total file size in bytes |
+    | _required_ |            |            | ~~**`105 block_size_in_bytes`**~~ 
| `long`                                                                      | 
**Deprecated. Always write a default in v1. Do not write in v2 or v3.** |
+    | _optional_ |            |            | ~~**`106  file_ordinal`**~~       
| `int`                                                                       | 
**Deprecated. Do not write.** |
+    | _optional_ |            |            | ~~**`107  sort_columns`**~~       
| `list<112: int>`                                                            | 
**Deprecated. Do not write.** |
+    | _optional_ | _optional_ | _optional_ | **`108  column_sizes`**           
| `map<117: int, 118: long>`                                                  | 
Map from column id to the total size on disk of all regions that store the 
column. Leave null for row-oriented formats (Avro) |

Review Comment:
   Content drop: original description was "Map from column id to the total size 
on disk of all regions that store the column. **Does not include bytes 
necessary to read other columns, like footers.** Leave null for row-oriented 
formats (Avro)". The bolded sentence was removed in this PR — it conveyed real 
information about what this metric does and doesn't account for. Should be 
restored if this PR is meant to be formatting-only.



##########
format/spec.md:
##########
@@ -921,33 +926,34 @@ The atomic operation used to commit metadata depends on 
how tables are tracked a
 
 Table metadata consists of the following fields:
 
-| v1         | v2         | v3         | Field                       | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                                 |
-| ---------- | ---------- 
|------------|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| _required_ | _required_ | _required_ | **`format-version`**        | An 
integer version number for the format. Implementations must throw an exception 
if a table's version is higher than the supported version.                      
                                                                                
                                                                                
              |
-| _optional_ | _required_ | _required_ | **`table-uuid`**            | A UUID 
that identifies the table, generated when the table is created. Implementations 
must throw an exception if a table's UUID does not match the expected UUID 
after refreshing metadata.                                                      
                                                                                
                                                               |
-| _required_ | _required_ | _required_ | **`location`**              | The 
table's base location. This is used by writers to determine where to store data 
files, manifest files, and table metadata files.                                
                                                                                
                                                                                
                                                             |
-|            | _required_ | _required_ | **`last-sequence-number`**  | The 
table's highest assigned sequence number, a monotonically increasing long that 
tracks the order of snapshots in a table.                                       
                                                                                
                                                                                
                                                              |
-| _required_ | _required_ | _required_ | **`last-updated-ms`**       | 
Timestamp in milliseconds from the unix epoch when the table was last updated. 
Each table metadata file should update this field just before writing.          
                                                                                
                                                                                
                                                                  |
-| _required_ | _required_ | _required_ | **`last-column-id`**        | An 
integer; the highest assigned column ID for the table. This is used to ensure 
columns are always assigned an unused ID when evolving schemas.                 
                                                                                
                                                                                
                                                                |
-| _required_ |            |            | **`schema`**                | The 
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` 
instead)                                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _required_ | _required_ | **`schemas`**               | A list 
of schemas, stored as objects with `schema-id`.                                 
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`current-schema-id`**     | ID of 
the table's current schema.                                                     
                                                                                
                                                                                
                                                                                
                                                           |
-| _required_ |            |            | **`partition-spec`**        | The 
table’s current partition spec, stored as only fields. Note that this is used 
by writers to partition data, but is not used when reading because reads use 
the specs stored in manifest files. (**Deprecated**: use `partition-specs` and 
`default-spec-id` instead)                                                      
                                                                   |
-| _optional_ | _required_ | _required_ | **`partition-specs`**       | A list 
of partition specs, stored as full partition spec objects.                      
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-spec-id`**       | ID of 
the "current" spec that writers should use by default.                          
                                                                                
                                                                                
                                                                                
                                                           |
-| _optional_ | _required_ | _required_ | **`last-partition-id`**     | An 
integer; the highest assigned partition field ID across all partition specs for 
the table. This is used to ensure partition fields are always assigned an 
unused ID when evolving specs.                                                  
                                                                                
                                                                    |
-| _optional_ | _optional_ | _optional_ | **`properties`**            | A 
string to string map of table properties. This is used to control settings that 
affect reading and writing and is not intended to be used for arbitrary 
metadata. For example, `commit.retry.num-retries` is used to control the number 
of commit retries.                                                              
                                                                       |
-| _optional_ | _optional_ | _optional_ | **`current-snapshot-id`**   | `long` 
ID of the current table snapshot; must be the same as the current ID of the 
`main` branch in `refs`.                                                        
                                                                                
                                                                                
                                                              |
-| _optional_ | _optional_ | _optional_ | **`snapshots`**             | A list 
of valid snapshots. Valid snapshots are snapshots for which all data files 
exist in the file system. A data file must not be deleted from the file system 
until the last snapshot in which it was listed is garbage collected.            
                                                                                
                                                                |
-| _optional_ | _optional_ | _optional_ | **`snapshot-log`**          | A list 
(optional) of timestamp and snapshot ID pairs that encodes changes to the 
current snapshot for the table. Each time the current-snapshot-id is changed, a 
new entry should be added with the last-updated-ms and the new 
current-snapshot-id. When snapshots are expired from the list of valid 
snapshots, all entries before a snapshot that has expired should be removed.    
          |
-| _optional_ | _optional_ | _optional_ | **`metadata-log`**          | A list 
(optional) of timestamp and metadata file location pairs that encodes changes 
to the previous metadata files for the table. Each time a new metadata file is 
created, a new entry of the previous metadata file location should be added to 
the list. Tables can be configured to remove oldest metadata log entries and 
keep a fixed-size log of the most recent entries after a commit. |
-| _optional_ | _required_ | _required_ | **`sort-orders`**           | A list 
of sort orders, stored as full sort order objects.                              
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _required_ | _required_ | **`default-sort-order-id`** | Default 
sort order id of the table. Note that this could be used by writers, but is not 
used when reading because reads use the specs stored in manifest files.         
                                                                                
                                                                                
                                                         |
-|            | _optional_ | _optional_ | **`refs`**                  | A map 
of snapshot references. The map keys are the unique snapshot reference names in 
the table, and the map values are snapshot reference objects. There is always a 
`main` branch reference pointing to the `current-snapshot-id` even if the 
`refs` map is null.                                                             
                                                                 |
-| _optional_ | _optional_ | _optional_ | **`statistics`**            | A list 
(optional) of [table statistics](#table-statistics).                            
                                                                                
                                                                                
                                                                                
                                                          |
-| _optional_ | _optional_ | _optional_ | **`partition-statistics`**  | A list 
(optional) of [partition statistics](#partition-statistics).                    
                                                                                
                                                                                
                                                                                
                                                          |
-|            |            | _required_ | **`next-row-id`**           | A 
`long` higher than all assigned row IDs; the next snapshot's `first-row-id`. 
See [Row Lineage](#row-lineage).                                                
                                                                                
                                                                                
                                                                  |
-|            |            | _optional_ | **`encryption-keys`**       | A list 
(optional) of [encryption keys](#encryption-keys) used for table encryption. |
+=== "v1 - v3"
+    | v1         | v2         | v3         | Field                       | 
Description |
+    | ---------- | ---------- |------------|-----------------------------| 
------------|
+    | _required_ | _required_ | _required_ | **`format-version`**        | An 
integer version number for the format. Implementations must throw an exception 
if a table’s version is higher than the supported version. |
+    | _optional_ | _required_ | _required_ | **`table-uuid`**            | A 
UUID that identifies the table, generated when the table is created. 
Implementations must throw an exception if a table’s UUID does not match the 
expected UUID after refreshing metadata. |
+    | _required_ | _required_ | _required_ | **`location`**              | The 
table’s base location. This is used by writers to determine where to store data 
files, manifest files, and table metadata files. |
+    |            | _required_ | _required_ | **`last-sequence-number`**  | The 
table’s highest assigned sequence number, a monotonically increasing long that 
tracks the order of snapshots in a table. |
+    | _required_ | _required_ | _required_ | **`last-updated-ms`**       | 
Timestamp in milliseconds from the unix epoch when the table was last updated. 
Each table metadata file should update this field just before writing. |
+    | _required_ | _required_ | _required_ | **`last-column-id`**        | An 
integer; the highest assigned column ID for the table. This is used to ensure 
columns are always assigned an unused ID when evolving schemas. |
+    | _required_ |            |            | **`schema`**                | The 
table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` 
instead) |
+    | _optional_ | _required_ | _required_ | **`schemas`**               | A 
list of schemas, stored as objects with `schema-id`. |
+    | _optional_ | _required_ | _required_ | **`current-schema-id`**     | ID 
of the table’s current schema. |
+    | _required_ |            |            | **`partition-spec`**        | The 
table’s current partition spec, stored as only fields. (**Deprecated**: use 
`partition-specs` and `default-spec-id` instead) |
+    | _optional_ | _required_ | _required_ | **`partition-specs`**       | A 
list of partition specs, stored as full partition spec objects. |
+    | _optional_ | _required_ | _required_ | **`default-spec-id`**       | ID 
of the "current" spec that writers should use by default. |
+    | _optional_ | _required_ | _required_ | **`last-partition-id`**     | An 
integer; the highest assigned partition field ID across all partition specs for 
the table. This is used to ensure partition fields are always assigned an 
unused ID when evolving specs. |
+    | _optional_ | _optional_ | _optional_ | **`properties`**            | A 
string to string map of table properties. This is used to control settings that 
affect reading and writing and is not intended to be used for arbitrary 
metadata. |

Review Comment:
   Content drop: original ended with "For example, `commit.retry.num-retries` 
is used to control the number of commit retries." The example clarified what 
kinds of properties this map is for (vs. arbitrary user metadata). The example 
value is also referenced again at `format/spec.md:1548` in the JSON example, so 
deleting it here orphans that reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to