rdblue commented on a change in pull request #887: Define file and position based deletion file in spec URL: https://github.com/apache/incubator-iceberg/pull/887#discussion_r401755240
########## File path: site/docs/spec.md ########## @@ -254,6 +255,33 @@ Notes: 1. Technically, data files can be deleted when the last snapshot that contains the file as “live” data is garbage collected. But this is harder to detect and requires finding the diff of multiple snapshots. It is easier to track what files are deleted in a snapshot and delete them when that snapshot expires. +#### Deletion Files + +Deletion files are files that indicate deletions of pre-existing rows to be applied to the dataset at read time. Deletion files may either specify rows by column value or by file name and row position. + +1. The file and position based deletion file has the schema as following: Review comment: This needs to clearly define what file and position are. I think that `file` should be renamed to `file_path` to match tracking in the manifest file and should use a similar description, along with a note that it must match what's in the manifest. We also need to note that positions start at 0. * `file_path` - The full URI of a data file, with FS scheme. This must match the `file_path` of the target data file in a manifest entry. * `position` - The ordinal position of a deleted row in the target data file identified by `file_path`, starting at 0. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
