szehon-ho commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319945486


##########
format/spec.md:
##########
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map<string, string>` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+#### Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:
+
+| v1 | v2 | Field name | Type | Description |
+|----|----|------------|------|-------------|
+| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg 
table's snapshot the partition statistics file is associated with. |
+| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of 
the partition statistics file. See [Partition Statistics 
file](#partition-statistics-file). |
+| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum 
data sequence number of the Iceberg table's snapshot the partition statistics 
was computed from. |
+
+#### Partition Statistics file
+
+Statistics information for each unique partition tuple is stored as a row in 
the default data file format of the table (for example, Parquet or ORC).
+These rows are sorted (in ascending manner with NULL FIRST) based on all 
partition columns from `partition` in the same order

Review Comment:
   Nit: can we simplify to just
   
   `These rows must be sorted (in ascending manner with NULL FIRST) by 
partition to optimize...` ?



##########
format/spec.md:
##########
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map<string, string>` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+#### Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:

Review Comment:
   Nit: does not make too much sense, does this suffice?
   
   `Partition statistics files contain a struct `partition-statistics' with the 
following fields`



##########
format/spec.md:
##########
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map<string, string>` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+#### Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,

Review Comment:
   Nit: I am not too sure these two sentences add much value, it is the case 
for any file reference in Iceberg , isnt it?
   
     
   ```A writer can optionally write the partition statistics file during each 
write operation. If the statistics file is written for the specific snapshot, 
it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to