rdblue commented on code in PR #4945:
URL: https://github.com/apache/iceberg/pull/4945#discussion_r929206942


##########
format/spec.md:
##########
@@ -665,9 +665,34 @@ Table metadata consists of the following fields:
 | _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored 
as full sort order objects. |
 | _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id 
of the table. Note that this could be used by writers, but is not used when 
reading because reads use the specs stored in manifest files. |
 |            | _optional_ | **`refs`** | A map of snapshot references. The map 
keys are the unique snapshot reference names in the table, and the map values 
are snapshot reference objects. There is always a `main` branch reference 
pointing to the `current-snapshot-id` even if the `refs` map is null. |
+| _optional_ | _optional_ | **`snapshot-statistics`** | A list (optional) of 
[table statistics](#table-statistics). |
 
 For serialization details, see Appendix C.
 
+#### Table statistics
+
+Table statistics files are valid [Puffin files](../puffin-spec). Statistics 
are informational. A reader can choose to
+ignore statistics information. Statistics support is not required to read the 
table correctly. A table can contain
+many statistics files associated with different table snapshots.
+
+Statistics files metadata within `snapshot-statistics` table metadata field is 
a struct with the following fields:
+
+| v1         | v2         | Field name                      | Type             
                 | Description                                                  
                                                                             |
+|------------|------------|---------------------------------|-----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`snapshot-id`**               | `string`         
                 | ID of the Iceberg table's snapshot the statistics were 
computed from.                                                                  
          |
+| _required_ | _required_ | **`statistics-path`**           | `string`         
                 | Path of the statistics file. See [Puffin file 
format](../puffin-spec).                                                        
            |
+| _required_ | _required_ | **`file-size-in-bytes`**        | `long`           
                 | Size of the statistics file.                                 
                                                                             |
+| _required_ | _required_ | **`file-footer-size-in-bytes`** | `long`           
                 | Total size of the statistics file's footer (not the footer 
payload size). See [Puffin file format](../puffin-spec) for footer definition. |
+| _required_ | _required_ | **`blob-metadata`**             | `list<blob 
metadata>` (see below) | A list of the blob metadata for statistics contained 
in the file with structure described below.                                     
     |
+
+Blob metadata is a struct with the following fields:
+
+| v1         | v2         | Field name       | Type                  | 
Description                                                                     
                   |
+|------------|------------|------------------|-----------------------|----------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`type`**       | `string`              | Type of 
the blob. Matches Blob type in the Puffin file.                                 
           |
+| _required_ | _required_ | **`fields`**     | `list<integer>`       | Ordered 
list of fields, given by field ID, on which the statistic was calculated.       
           |
+| _required_ | _optional_ | **`properties`** | `map<string, string>` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |

Review Comment:
   I think it is probably a typo that this is required in v1 but optional in v2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to