mderoy opened a new pull request, #16668: URL: https://github.com/apache/iceberg/pull/16668
Today you are unable to apply a global equality delete to a table without having an unpartitioned partition spec (as spec 0 in some cases) in specsById. This is super misleading because you are able to create the equality delete by passing PartitionSpec.unpartitioned(), but due to a collision with the specId, it is not properly registered as a global equality delete ``` PartitionSpec.unpartitioned() is a singleton constant with specId = 0. A table's first partition spec is also assigned specId = 0. When a global equality delete is written using PartitionSpec.unpartitioned() against a table whose first(and only) spec is partitioned, the delete file stores specId = 0. At read time, specsById.get(0) resolves to the table's partitioned spec rather than the unpartitioned constant, causing the delete to be misclassified as partition-scoped. It is indexed under an empty partition key in eqDeletesByPartition rather than globalDeletes. Since no data file's partition matches an empty key, the delete never applies and deleted rows are incorrectly returned to the reader. ``` Note, I'm compelled to think there is a better fix...but I don't think we can reserve the 0 specId for unpartitioned since tables already exist that break that convention. We could maybe use -1 but it would not fix the issue for existing tables. This fix just treats empty partition data as global. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
