rdblue commented on a change in pull request #947:
URL: https://github.com/apache/iceberg/pull/947#discussion_r438353365
##########
File path: site/docs/spec.md
##########
@@ -433,8 +433,17 @@ The rows in the delete file must be sorted by `file_path`
then `position` to opt
* Sorting by `file_path` allows filter pushdown by file in columnar storage
formats.
* Sorting by `position` allows filtering rows while scanning, to avoid
keeping deletes in memory.
-Though the delete files can be written using any supported data file format in
Iceberg, it is recommended to write delete files with same file format as the
table's file format.
+Position-based delete files can be written using any supported data file
format in Iceberg, but it is recommended to write delete files with same file
format as the table's default file format.
+#### Equality Delete Files
+
+Equality delete files identify rows in a collection of data files that have
been deleted by encoding equality predicates. Rows may be identified by more
than one column.
Review comment:
Good point. The column cannot be ignored and should still be applied to
any data file before its sequence number using the same projection logic. That
is, if a column ID is missing from a data file, it is assumed to be all nulls.
I think some examples would definitely help clarify this as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]