findepi commented on code in PR #4945:
URL: https://github.com/apache/iceberg/pull/4945#discussion_r905969472


##########
format/spec.md:
##########
@@ -486,16 +486,17 @@ When reading v1 manifests with no sequence number column, 
sequence numbers for a
 
 A snapshot consists of the following fields:
 
-| v1         | v2         | Field                    | Description |
-| ---------- | ---------- | ------------------------ | ----------- |
-| _required_ | _required_ | **`snapshot-id`**        | A unique long ID |
-| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the 
snapshot's parent. Omitted for any snapshot with no parent |
-|            | _required_ | **`sequence-number`**    | A monotonically 
increasing long that tracks the order of changes to a table |
-| _required_ | _required_ | **`timestamp-ms`**       | A timestamp when the 
snapshot was created, used for garbage collection and table inspection |
-| _optional_ | _required_ | **`manifest-list`**      | The location of a 
manifest list for this snapshot that tracks manifest files with additional 
metadata |
-| _optional_ |            | **`manifests`**          | A list of manifest file 
locations. Must be omitted if `manifest-list` is present |
-| _optional_ | _required_ | **`summary`**            | A string map that 
summarizes the snapshot changes, including `operation` (see below) |
-| _optional_ | _optional_ | **`schema-id`**          | ID of the table's 
current schema when the snapshot was created |
+| v1         | v2         | Field                    | Description             
                                                                                
                                                        |
+| ---------- | ---------- | ------------------------ 
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`snapshot-id`**        | A unique long ID        
                                                                                
                                                        |
+| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the 
snapshot's parent. Omitted for any snapshot with no parent                      
                                                         |
+|            | _required_ | **`sequence-number`**    | A monotonically 
increasing long that tracks the order of changes to a table                     
                                                                |
+| _required_ | _required_ | **`timestamp-ms`**       | A timestamp when the 
snapshot was created, used for garbage collection and table inspection          
                                                           |
+| _optional_ | _required_ | **`manifest-list`**      | The location of a 
manifest list for this snapshot that tracks manifest files with additional 
metadata                                                           |
+| _optional_ |            | **`manifests`**          | A list of manifest file 
locations. Must be omitted if `manifest-list` is present                        
                                                        |
+| _optional_ | _required_ | **`summary`**            | A string map that 
summarizes the snapshot changes, including `operation` (see below)              
                                                              |
+| _optional_ | _optional_ | **`schema-id`**          | ID of the table's 
current schema when the snapshot was created                                    
                                                              |
+| _optional_ | _optional_ | **`statistics`**         | A [statistics file's 
metadata](#statistics-file). The field should be retained by writers, unless 
writer updates the statistics, or knows they became obsolete. |

Review Comment:
   > it shouldn't be so hard for the new client find the most recent stats 
files and carry them over, right?
   
   @flyrain 
   with current implementation --
   iceberg library will do this automatically.
   
   however, clients using old Iceberg library will "forget" stats when creating 
new snapshot (eg inserting new data)
   
   > I would probably prefer that new structure, or something similar. What do 
you think, @findepi and @flyrain?
   
   
   @rdblue i am not fluent in Iceberg Table / Snapshot intricacies, lifecycles, 
and branching, so I don't have opinion.
   
   I know what we need to be able to
   - attach a stats file to a table 
   - update/replace/remove existing stats file
   - retain existing stats file when doing eg DML on a table 
   - reading old snapshot (time travel) should be able to use the stat file as 
was present in the table at that time
   
   i believe attaching the stats file to snapshot addresses these needs.
   if there is some other and better way to address these needs, please be 
explicit. I still need guidance in Iceberg world.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to