aokolnychyi commented on pull request #1469: URL: https://github.com/apache/iceberg/pull/1469#issuecomment-697020400
> For a DELETE using position delete files, I think that this isn't quite correct: "data files referenced by new deletes must be still present". The logic for "no validation for delete files" applies to this case: if a data file was deleted, then it's okay to delete the row twice. The validation should be "data files referenced by new deletes must still be present or must be deleted; i.e., cannot be rewritten or overwritten." What about a copy-on-write update that happened concurrently? It could take a file that my delete file references and rewrite it into a new file and keep the record I want to remove. If we allow to commit the delete file that references now non-existing file, there is no way we can guarantee the record we were about to remove is actually removed. > For a DELETE using equality delete files, I'm not sure that snapshot isolation is distinct. If a data file is added concurrently that has a row that is now deleted, then either that commit is first and the row is deleted or the commit is later and it is appended. Either way, the operations are independent. There is no need to validate "no new potentially matching data files since we read" because there is not necessarily a read, and the delete applies to the data automatically. Agree. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
