rdblue commented on a change in pull request #887: Define file and position 
based deletion file in spec
URL: https://github.com/apache/incubator-iceberg/pull/887#discussion_r401755240
 
 

 ##########
 File path: site/docs/spec.md
 ##########
 @@ -254,6 +255,33 @@ Notes:
 
 1. Technically, data files can be deleted when the last snapshot that contains 
the file as “live” data is garbage collected. But this is harder to detect and 
requires finding the diff of multiple snapshots. It is easier to track what 
files are deleted in a snapshot and delete them when that snapshot expires.
 
+#### Deletion Files
+
+Deletion files are files that indicate deletions of pre-existing rows to be 
applied to the dataset at read time. Deletion files may either specify rows by 
column value or by file name and row position.
+
+1. The file and position based deletion file has the schema as following:
 
 Review comment:
   This needs to clearly define what file and position are. I think that `file` 
should be renamed to `file_path` to match tracking in the manifest file and 
should use a similar description, along with a note that it must match what's 
in the manifest. We also need to note that positions start at 0.
   
   * `file_path` - The full URI of a data file, with FS scheme. This must match 
the `file_path` of the target data file in a manifest entry.
   * `position` - The ordinal position of a deleted row in the target data file 
identified by `file_path`, starting at 0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to