flyrain commented on code in PR #6002:
URL: https://github.com/apache/iceberg/pull/6002#discussion_r1006133472


##########
format/spec.md:
##########
@@ -463,8 +464,9 @@ When a file is added to the dataset, its manifest entry 
should store the snapsho
 
 When a file is replaced or deleted from the dataset, its manifest entry fields 
store the snapshot ID in which the file was deleted and status 2 (deleted). The 
file may be deleted from the file system when the snapshot in which it was 
deleted is garbage collected, assuming that older snapshots have also been 
garbage collected [1].
 
-Iceberg v2 adds a sequence number to the entry and makes the snapshot id 
optional. Both fields, `sequence_number` and `snapshot_id`, are inherited from 
manifest metadata when `null`. That is, if the field is `null` for an entry, 
then the entry must inherit its value from the manifest file's metadata, stored 
in the manifest list [2].
-The `sequence_number` field represents the data sequence number and must never 
change after a file is added to the dataset. 
+Iceberg v2 adds data and file sequence numbers to the entry and makes the 
snapshot ID optional. Values for these fields are inherited from manifest 
metadata when `null`. That is, if the field is `null` for an entry, then the 
entry must inherit its value from the manifest file's metadata, stored in the 
manifest list.
+The `sequence_number` field represents the data sequence number and must never 
change after a file is added to the dataset. The `file_sequence_number` field 
represents the sequence number of the snapshot that added the file and must 
also remain unchanged upon assigning at commit.

Review Comment:
   I feel a bit hard to understand the difference between them. Can we add 
something like?
   ```
   The `sequence_number` will use the one from original files in the case of 
compaction, but the `file_sequence_number` will alway be one from the snapshot 
added the file. `file_sequence_number` of a file may be larger or the same as 
its `sequence_number`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to